Guideline Subject: Approval Date: Review Date: Review Committee: Number: Massively Parallel Sequencing Implementation Guidelines May 2014, May 2015 May 2018 Genetic Advisory Committee 3/2014 Common Abbreviations: BAM File Binary form of a SAM File CNV Copy number variant such as a duplication or deletion CSV File Comma separated variable (sometimes TSV: Tab separated variable); generally able to be opened with a spreadsheet program FASTQ File A text file for storing sequence data and associated quality scores Indel Short insertion or deletion (typically less than 200bp) MPS Massively Parallel Sequencing (“Next Generation Sequencing”) SAM File A tab-delimited text file that contains sequence alignment data. SNV Single nucleotide variant Ti/Tv Transition / Transversion Ratio VCF File Variant Call Format File Background This document is aimed at diagnostic laboratories preparing for implementation of next generation sequencing based genomic methods. At the time of writing (second quarter of 2015), there are no NPAAC standard publications specifically aimed at next generation sequencing in Australian diagnostic laboratories. Australian diagnostic laboratories should adhere to these Guidelines to ensure that the high quality of medical genetic testing across Australia is maintained. It is hoped that this document may provide the basis for a standard in the future. The first version of this guideline was launched at Royal College of Pathologists of Australasia (RCPA) College's Annual Scientific Meeting, Pathology Update, in February 2013 in Melbourne, Victoria. This document is the second version updated to reflect the developing knowledge and changes in this area of testing in the last 2 years. The updated guidelines have been drafted under the auspices of the Genetics Advisory Committee of the College, with the Chair of the Committee, Melody Caramins, acting as Editor-In-Chief. As per the first version, separate writing committees were developed to address each of the topics/chapters. The writing committees had nominees from the RCPA discipline of Genetics, Faculty of Science of the RCPA, the HGSA, and other experts. Chapter 1 Ethical and legal Issues 1. Introduction Medical testing by genomic methods share many ethical, legal and social issues with other forms of clinical investigation. Genomic methods are simply methods and do not necessarily introduce new issues. However genomic testing is marked out by the opportunity and challenge of scale. Existing issues of informed consent, incidental findings, the right not to know, family studies and re-contacting are potentially magnified due to the of the volume of information that these tests yield. Comprehensive genomic analyses (e.g. whole genome sequencing or exome sequencing) can generate information pertinent to the management of diseases other than the targeted clinical condition being investigated. Genomic testing can, therefore, be viewed as comprising both a diagnostic and a screening function. The scale of this overlap in test function is unprecedented. The implications of this complex testing scenario for the individual patient will require clear explanation in order to obtain informed consent. Genomic testing should not be performed without careful consideration of these broader issues. For this reason, this chapter on ethical and legal implications of genomic testing precedes the chapters detailing the analytical, interpretive, reporting, and resource requirements for such testing. 2. Medical Responsibilities 2.1 Medical genomic testing is subject to ethical guidelines for medical practice and existing NPAAC guidelines. There is an ethical dimension to all medical testing. Patients expect that they will be offered tests that are safe and provide information that is accurate and useful in the management of their condition. If tests are to be used in clinical management, it is expected that they will have undergone an evaluation of the evidence for their safety, analytic and clinical validity, and clinical utility. Tests which have not been evaluated in this fashion or where the evidence base is weak may be used for research purposes, but any reports should be identified as research only, or not validated to clinical standards (as appropriate) and the patient will need to give their informed consent to be a part of a research study. Guidance on ethical issues relating specifically to genetic testing comprises foundation documents and a rapidly evolving peer reviewed literature, some of which are listed below. Medical genomic testing is subject to the existing NPAAC guidelines. When genomic testing is used for targeted DNA sequencing i.e. analysis of genes known to cause the patient's current disease, it falls within current guidelines for genetic testing by Sanger sequencing. Genomic testing for a clinical condition with a suspected genomic causation but with unknown a-priori genetic basis i.e. whole exome or whole genome sequencing falls more within current guidelines for microarray testing. 2.1.1 Resources • • • • • • ALRC-NHMRC (2003): Essentially yours – the protection of human genetic information in Australia. NHMRC (2010): Medical genetic testing: Information for health professionals. NPAAC (2013): Requirements for Medical Pathology Services. NPAAC (2013): Requirements for medical testing of human nucleic acids. Australian Medical Council. Good medical practice: a code of conduct for doctors in Australia. Australian Medical Association Code of Ethics. 2 • • • • • • • • Standardisation in clinical laboratory medicine: an ethical reflection. Bossuyt, Louche, & Wiik (2008). ACCE model process for evaluating genetic tests. CDC (2004). American College of Medical Genetics (2012): Points to Consider in the Clinical Application of Genomic Sequencing. PHG Foundation (2011): Next steps in the sequence. The implications of whole genome sequencing for health in the UK. Health Council of the Netherlands: The thousand dollar genome, an ethical exploration. The Hague: Centre for Ethics and Health, 2010. Evans and Rothschild, Return of results: not that complicated? Genet Med 14(4):358-60; 2012 HGSA Commentary of ACMG recommendations 2014 Incidental findings in clinical genomics: a clarification. American College of Medical Genetics and Genomics (2013) Genetics in Medicine 15 (8) 2013 2.2 There should be an explicit medical consultation framework within which clinician and laboratory operate whenever genomic testing is undertaken. While all pathology test requests imply a consultation between the referring clinician and the laboratory professional supervising the test, this should be an explicit requirement of referrals for genomic testing. The pathology teams need to know the specific clinical question being asked of testing to allow planning of the analytical processes and to facilitate interpretation of the analytical result. The referring clinician should provide adequate clinical and laboratory information to assist in these decisions. Laboratory Directors should clearly distinguish between clinical testing with a strong evidence base from testing for research purposes in which there is an evolving evidence base. Laboratories should clearly state their policy on reporting incidental findings and variants of unknown significance. The referring clinician should know what analytical approach will be undertaken, the policies of the pathology service with respect to reporting of findings, incidental findings (including carrier status for recessive disorders), storage of data, and links with research bodies and biobanks, so that this information can be conveyed to the patient during the informed consent process. The pathology team and the clinician may wish to vary procedures regarding information provided, consent, and testing performed on a patient-by-patient basis. 2.3 Patients should receive counselling prior to genomic testing by a genetic counsellor or relevant medical specialist. There is no consensus guideline on best practice with respect to the components and communication elements of counselling for genomic testing, however the core principles of genetic testing apply and should be discussed. These core principles include the requirement for a discussion of both expected results and incidental findings, that the interpretation of results requires reference to population and disease specific genomic databases, and that the interpretation of results may alter with increasing knowledge. Information about the storage of data, and protocols for re-analysis and call–back, should be communicated to the patient during the process of obtaining informed consent. The pre-test counselling should also clarify that results will be conveyed to the patient, discussed during post-test counselling, and distributed to specified clinicians and records. These principles also apply in settings in which consent is provided by an appropriate proxy for the patient e.g. in a paediatric setting. The HGSA commentary emphasises importance of patient autonomy: 3 The HGSA Code of Ethics states that individual autonomy should be respected by “actively promoting informed decision-making, which is not coerced, for all involved by providing accurate and balanced information and an opportunity to deliberate, based on individual values and beliefs”. The ACMG recommends pre and post-test counselling, and note that it could be considered coercive with respect to predictive information to only offer a choice between receiving reported incidental findings and not having genomic testing at all. 2.4 A formal consent process should be in place and patients should consent to testing for a medical service. A standard consent form should be developed which is acceptable to all jurisdictions. Online access is available to Information/consent/test ordering forms for various genomic assays provided by the Baylor College of Medicine (USA) and Ambry Genetics (USA). Please note that these resources are examples only, and no commitment is made that these are suitable for specific purposes, times, or places. 2.4.1 Resources Examples of genomic information sheets and consent forms: • • • • • • • • Genomic Information Sheet General Genomic Consent Form Cancer Exome Consent Form Exome Sequencing Information Sheet General Exome Consent Form Cancer Exome or Panel Consent Form SA Pathology informed consent for genetic testing SA Pathology informed consent for genomic testing 2.5 The patient should provide specific consent to allow the contribution of their de-identified data to public databases. The development of population reference ranges for laboratory analytes has been a fundamental process in the development of diagnostic testing. The clinical validity and utility of tests which identify deviations from population reference ranges can be assessed, and an evidence base for the use of the test can be built. This fundamental principle also applies to genomic testing i.e. documenting the frequency and clinical relevance of variants in different populations. All patients seeking genomic testing should, therefore, be asked for permission during the consent process to include de-identified results into publicly available databases for the common good. 2.6 Patients should receive a clear written record of the policy regarding the reporting of incidental findings. As yet, there is no consensus on whether and what incidental findings should be reported to the patient. Patients may have the right to know, to know of some, or not to know about incidental findings. Doctors have both an obligation to do what the informed patient has requested and to advise the patient of any serious health risk revealed by testing. They also have an obligation to the blood relatives of the patient. These ethical dilemmas are made more complex by the fact that the significance of many findings is unknown and that the classification of benign and pathogenic mutations may be unreliable and alter over time with accumulation of new evidence. The recommended use of targeted analysis, where it does not interfere in reaching a diagnosis, is a pragmatic approach to minimise these ethical dilemmas. 4 One approach is to classify findings into groups (or “bins”) as a function of the risk of disease and the existence of effective therapy. A process has been proposed in which stakeholders would determine which genes belong in the medically actionable bin (see Resources below). Achieving consensus on the management of incidental findings is likely to be a complex process as studies show the wide range of views of both patients and healthcare professionals as to what constitutes valuable information. The construction of databases with phenotypic annotation of genomic variants and evidence based research will be critical to the success of this approach. It is current practice for doctors to seek consent from patients to notify them of actionable mutation results identified during testing. The list of these mutations needs to be documented and regularly reviewed. Should patients decline to be notified, many doctors have taken the decision not to proceed with genomic testing and have offered other diagnostic pathways. In this period while debate and guideline development are in progress, doctors are using standard practices (clinical reasoning, the advice of peers and local ethics committees) to develop their own approach to these issues. Whatever approach they choose, clear verbal and written communication of the policy about what findings will and will not be disclosed should be provided. As this is an emerging technology, there is, as yet, no case law regarding medical liability of referring clinicians or pathologists in this field. 2.6.1 Resources • • • • Berg JS, Khoury MJ, Evans JP. Deploying whole genome sequencing in clinical practice and public health: meeting the challenge one bin at a time. Genet Med 2011 Jun;13(6):499-504. Evans and Rothschild, Return of results: not that complicated? Genet Med 14(4):358-60; 2012 Managing incidental and pertinent findings from WGS in the 100,000 Genome Project. A discussion paper from the PHG Foundation April 2013. Rigter et al. Reflecting on Earlier Experiences with Unsolicited Findings: Points to Consider for Next-Generation Sequencing and Informed Consent in Diagnostics. Hum Mutat 2013 Jun 19. 2.7 Results from genomic studies should be conveyed to the patient in the context of post-test genetic counselling by an appropriately qualified expert. 2.7.1 Resources • • • • • NHMRC 2010 Medical genetic testing: Information for health professionals. Australian Health Ethics Committee. NPAAC (2012) Requirements for medical testing of human nucleic acids. Refer to Appendices regarding ethical categorization of tests. Biedecker, Opportunities and challenges for the integration of massively parallel genomic sequencing into clinical practice: lessons from the ClinSeq project. Genet Med 14(4): 393-398; 2012. Green et al, Exploring concordance and discordance for return of incidental findings from clinical sequencing. Genet Med 14(4):405-10; 2012. The Collaborative Institutional Training Initiative (CITI): a resource for research ethics education. 5 3. Laboratory Responsibilities 3.1 The accountable laboratory professional should sight a copy of the consent form before testing. The existing NPAAC Requirements for Medical Testing of Human Nucleic Acids (see Appendix A) distinguishes between two classes of DNA tests: • • Level 1 tests (the default classification; includes diagnostic testing and neonatal screening) and Level 2 tests (DNA testing for which specialised knowledge is needed for the DNA test to be requested, and for which professional genetic counselling should precede and accompany the test; this includes predictive and pre-symptomatic tests). The document notes that specific written consent and counselling issues are associated with Level 2 tests, and assigns responsibility to the laboratory director to document consent and defer testing if there is a concern about the consent process. Although diagnostic testing using genomic methods could be regarded as a Level 1 test, genomic testing should be regarded as a level 2 test because of the complexity of the issues associated with consent and variants of unknown significance. Pathologists and scientists need to be assured that the patient has undergone pre-test counselling by an accredited genetic counsellor or relevant medical specialist, that counselling has included discussion of expected outcomes of testing and the likelihood and type of incidental findings, and that the patient has given informed consent. The NPAAC document does not require that the consent for Level 2 testing be sighted - only that the accountable laboratory professional knows that such consent has been provided. However, to ensure explicit consistency in the pre-analytical, analytical, and post-analytical phases of a genomic test (see Sections above and below), the laboratory professional may request to sight a copy of the completed consent form prior to testing. The laboratory should request a copy of the consent form for whole exome or whole genome sequencing. 3.2 Data generated by genomic testing are subject to the privacy legislation operating in each jurisdiction. Under the Australian Privacy Act 1988 (Cth) (Privacy Act), the health service provider in the private sector is responsible for the security and privacy of a patient’s health information. Commonwealth State or Territory laws apply to health service providers in the public sector. There are specific provisions under the Federal Privacy Act to allow medical practitioners to disclose a patient’s genetic information without their consent to a relative if there is a serious threat to life, health, safety of the relative and the use or disclosure is necessary to lessen or prevent that threat (APP 6). Note that compliance with the NHMRC Guidelines (see below) is a legal requirement for anyone wishing to utilize this legal provision. There is debate as to whether additional legislative security is necessary because of the identifiability of data generated by genomic testing. An alternative view has been put that it is not the data or DNA sequences per se that is identifiable, but rather that identifiability only occurs at the time that genome sequence is matched with that of the patient. As use of the sample without the patient’s permission is illegal under Australian privacy legislation, any matching of genomic data with a particular patient is illegal and the perpetrator is subject to existing legal penalties. Further, under the Privacy Act, any unsolicited or non-legally collected personal information should be de-identified and possibly destroyed. 3.2.1 Resources • • Australian Privacy Principles. Privacy Act 1988 (Cth). 6 • • • • • • Privacy Information Sheets. Privacy in the private health sector. Fact sheet on management of genetic data in the private sector. NHMRC Guidelines on disclosure of genetic information without consent. NHMRC National Statement on Ethical Conduct in Human Research 2007 – updated 2009. RCPA Guideline: Managing Privacy Information in Laboratories https://www.rcpa.edu.au/getattachment/a631a573-0d07-4bd4-ba67cfe545618dd1/Managing-Privacy-Information-in-Laboratories.aspx 3.3 DNA samples and laboratory records should be retained in accordance with existing NPAAC requirements. The existing NPAAC standard for samples submitted for medical testing specifies the retention of diagnostic material for “three months from the date of issue of the report for an individual or for completion of a family study or for completion of testing: whichever of the three periods is longest”. It is reasonable to apply this to samples submitted for genomic testing. The document specifies that “The copy of the original report, or ability to reprint the information content of an original report has a minimum retention time of 100 years.” This may need to be altered to accommodate genomic testing. It should be noted that the standard specifies for only reports to be kept indefinitely; the raw data files from genomic testing are very large and their storage poses a significant cost and logistical burden. The Privacy Act advises of the risks of keeping health information longer than is necessary as this may increase the risk of privacy breaches. A 3 year study at South Eastern Area Laboratory Service (SEALS) in NSW has commenced to determine the need to access archived genomics reports. This is likely to inform laboratory practice. In the interim, laboratories are recommended to retain at least an aliquot of the DNA and the corresponding VCF file for 3 years. 3.3.1 Resources • NPAAC Requirements of the retention of laboratory records and diagnostic material (Sixth Edition 2013) 3.4 Data generated by genomic testing should be stored in accordance with the privacy legislation operating in each jurisdiction. Standards Australia AS/NZS ISO/IEC 17799:2001 and AS/NZS 7799.2:2000 incorporate electronic storage of medical records. There are no specific standards for the storage of genetic information however the NPAAC document “Requirements for Information communication” and the RCPA “Standards for clinical databases of genetic variants”, provide useful guidance. Enforcement provisions for misuse or loss or disclosure without consent of stored health information are legislated under Federal and State privacy legislation. There is no specific Australian legislation around genetic information. If health data from Australian patients is analysed or stored on computing platforms which are physically located in another country, they are subject to Australian Privacy legislation. If data are lost, disclosed or stolen, the Australian entity that transmitted the information overseas may be found liable. 3.4.1 Resources • See also section on IT infrastructure 7 3.5 Upon request, the Laboratory Director should give patients access to the personal genomic information for which consent has been given. Provisions are made for some exceptions in the Australian Privacy Principles (APP 12). examples include: Relevant • A serious threat to life, health or safety of an individual or to public health or public safety; or • Giving access would have an unreasonable impact on the privacy of other individuals; or • Various exceptions relating to current or anticipated legal proceedings or under certain legal authorities. Procedures may need to be put in place to address specific cultural sensitivities relating to the access by patients to their genomic data. 3.5.1 Resources • • • • Guidelines from the Office of the Australian Information Commissioner. RCPA Guideline: Release of Pathology Results to Patients. Guidelines for Researchers on Health Research Involving Māori 2010 VERSION 2. NHMRC (2007): National Statement on Ethical Conduct in Human Research. See Section 4.7 re working with people of Aboriginal or Torres Strait Islander heritage. 3.6 The Director of the laboratory should observe the relevant provisions of privacy legislation in that jurisdiction should the business circumstances of the laboratory change. For a laboratory operating in the private sector (and hence falling under the requirements of the PrivacyAct), the relevant Australian Privacy Principles include numbers 3, 6, 11 and 12. Where ownership of the health service provider changes (e.g. amalgamation, takeover, closure) but the original purpose for which the information was used does not change, the health information stays with the organisation and there is no requirement to inform or seek consent from the patient. However, if the new health service provider intends to use the information for purposes other than for which it was collected, the new provider may need to seek consent from the patient. Where a health service provider’s business ceases altogether, arrangements will need to be put in place to securely transfer and store the patient’s health information. 3.6.1 Resources (as per 3.5.1) 4. Scope of Testing 4.1 Throughout the process of testing, there should be explicit distinctions between targeted diagnostic testing i.e. of selected genes, whole exome or genome sequencing, screening of an unaffected person, and research studies. An understanding of the role of genomic testing in investigation of disease is rapidly evolving. Genomic testing may be indicated in the investigation of patients with a Mendelian phenotype or family history which strongly implicates a genetic aetiology. The case for targeted diagnostic testing is clear where the phenotype is consistent with a known disease in which mutations in a number of genes are known to be causative. Genomic testing may also play a role in the investigation of families with a Mendelian phenotype where the specific genetic aetiology is not established i.e. genome-wide diagnostic testing. It may also play a role in the investigation of multiple affected individuals from different families or single 8 individuals with very rare genetic disorders, where randomised clinical trials to assess clinical utility and other measures of efficacy of genomic testing are not possible. These referrals are based on clinical judgment. This approach is analogous to investigations such as cytogenetic analysis, microarrays or tissue biopsy where the target pathology is unknown. There is debate about the use of genomic methods in preconception carrier screening for relevant mutations, prenatal screening, and as a first tier approach for newborn screening. A recent report from the Foundation for Genomics and Population Health in the UK concludes that “Extensive interrogation of genomic data for preventive purpose is not recommended.” These different purposes of genomic testing involve different ethical considerations as well significant differences in analysis and interpretation. The clinician, laboratory, and patient should have a clear understanding of the purpose and scope of a test, and this should be reflected in pre-test counselling and consent, in the analysis and interpretation, and in the reporting and distribution of a genomic test. 4.2 When genomic testing is used in the investigation of a heritable disease, the analytical approach should be targeted so that only genes relevant to the specific disease phenotype are analysed, provided that this approach does not compromise test performance. Current recommendations are that the analytic approach be clinically targeted at a candidate gene or set of genes which are known to cause the disease phenotype in question. “Genome-wide” diagnostic testing should only be considered if it is clear that testing with a narrower scope (using filters) will yield insufficient results. Where the phenotype is non-specific or not recognized as a particular syndrome, wider capture of data with targeted data interrogation that can be performed in a tiered manner is useful. 4.2.1 Resources • Health Council of the Netherlands: The thousand dollar genome, an ethical exploration. The Hague: Centre for Ethics and Health, 2010. See Section 8.2 (p 48). 4.3 Patients should provide consent if their genomic data derived from clinical testing are to be used for research purposes. There are no specific conditions to be applied to the research use of samples or data that had been obtained for diagnostic testing using genomic methods. Collaborations between pathology practices providing diagnostic testing and researchers take several forms and are subject to the provisions of privacy legislation in the various jurisdictions and to the NHMRC National Statement on Ethical Conduct of Human Research. Collaborations between institutions are subject to the framework outlined in the Australian Code for the Responsible Conduct of Research (2007). Patients should have provided informed consent for the use of their samples or data for research; this consent should be distinct from the consent process for clinical testing. Consent may be • • • Specific to a project under consideration, or Extended where consent is given for the use of data or tissue in future research projects that are an extension of or closely related to the original project or in the same general area of research, or Unspecified where consent is given for the use of data or tissue in any future research. There is ongoing debate in Australia about unspecified, also known as “open-ended”, consent to the use of genomic data in research. Should the patient give extended or unspecified consent, further consents are required to enter data or tissue into databases or biobanks. The informed consent 9 process should clearly state the protocol with respect to re-contacting the patient about incidental findings identified during subsequent research projects. The view presented by NHMRC is that human tissues samples should always be regarded, in principle, as re-identifiable. All requests from researchers for the release of de-identified data from samples submitted for diagnostic testing and of associated laboratory data to biobanks or databases, will require approval by the Ethics Committee with responsibility for oversight of activities of the pathology service. Provided a suitable ethical framework is in place, the diagnostic laboratory can provide samples and data to the researchers, but should retain sufficient sample for the minimum retention period and laboratory records to meet NPAAC requirements for the retention of the health record. 4.3.1 Resources • • • • • NHMRC 2007 National Statement on Ethical Conduct of Human Research. Australian Code for the Responsible Conduct of Research. PHG Foundation (2011) Next steps in the sequence. The implications of whole genome sequencing for health in the UK. US National Human Genome Research Institute Strategic Plan 2011. Data submission policy of dbGap database. CHAPTER TWO Wet lab 1. Introduction Next generation sequencing (NGS) has been adopted in all areas of molecular diagnosis. Laboratories must become familiar with the critical differences between NGS and traditional Sanger sequencing. The wet laboratory process is one such area of critical difference. Robust quality assurance and quality control procedures are essential to ensuring the reliability of NGS testing results. This chapter will focus on the “wet” laboratory issues including laboratory environment, sample/library preparation, template generation, sequencing and quality assurance in genomic diagnostic application. 1.1. Wet lab processes Many of the guidelines in this document are common to all forms of nucleic acid testing. These guidelines should be read in conjunction with ISO 15189 and all relevant NPAAC documents, but particularly Requirements for Medical Testing of Human Nucleic Acids, and Requirements for the Development and Use of In-House In Vitro diagnostic Devices. We propose that these principles and guidance could form a foundation for future specifications of performance and formal regulations of genomic testing. It is not our intention to generate a user guide and provide all the solutions. Instead, we try to include some relevant resources for your reference. For example, some relevant “wet” laboratory issues can be found from the website of the Division of Laboratory Programs, Standards, and Services (DLPSS) of the American Centers for Disease Control and Prevention (CDC). 1.1.1 Resources • • GenomeWeb Clinical Sequencing News. CLSI: Molecular Methods for Clinical Genetics and Oncology Testing. 10 • • • • • • • • • • NPAAC: Requirements for the Supervision of Medical Pathology Laboratories. Gargis AS, Kalman L, Berry MW, et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol. 2012 30(11):1033-6. Best Practice Guidelines for the Use of Next Generation Sequencing (NGS) Applications in Genome Diagnostics: A National Collaborative Study of Dutch Genome Diagnostic Laboratories. Hum Mutat. 2013 Jun 17. Ellard S, Lindsay H, Camm N, et al. Practice guidelines for Targeted Next Generation Sequencing Analysis and Interpretation. UK Association for Clinical Genetic Science; 2014. Linderman MD, Brandt T, Edelmann L, et al. Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Medical Genomics 2014; 7: 20. Pritchard CC, Salipante SJ, Koehler K, et al. Validation and implementation of targeted capture and sequencing for the detection of actionable mutation, copy number variation, and gene rearrangement in clinical cancer specimens. The Journal of Molecular Diagnostics 2014; 16(1): 56-67 Aziz N, et al. (2014) College of American Pathologists' Laboratory Standards for NextGeneration Sequencing Clinical Tests. Arch Pathol Lab Med. Rehm HL et al (2013) ACMG clinical laboratory standards for next-generation sequencing. Genet. Med. 15 733-747 Van Keuren-Jensen, Keats and Craig (2014) Bringing RNA-seq closer to the clinic Nat. Biotechnol. 32:884-885 Zook JM, et al. (2014) Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol 32:246-251. 2. Measures to Control Contamination 2.1 The laboratory should be designed to minimise the contamination of samples at different stages of the workflow with other specimens or amplified products. Laboratories should ensure the physical design can accommodate separate areas for patient derived samples and amplified material. Possible cross contamination between these areas including by movement of equipment, staff, or aerosols should be assessed and managed. Measures should be available to both detect cross contamination between clinical samples, and to eliminate it. Detection may include the use of processing blanks or environmental monitoring. Elimination may include the use of hypochlorite or other decontamination measures. For further information refer to refer to NPAAC Requirements for the Medical Testing of Human Nucleic Acids. 2.2 Cross contamination between samples due to carryover from equipment: Laboratories should ensure recommended and appropriate maintenance and cleaning processes are performed to eliminate carryover contamination. Laboratories should include a monitoring process for carryover contamination as part of regular internal quality control Sample indexes (barcodes) used to identify unique reads in pooled libraries can be used to detect carryover contamination. These should be re-used on the longest cycle possible. Consecutive runs of the same sequencing instrument using the same barcode indexes should be avoided. Frequent reuse of the same set of barcode indexes will compromise the laboratory's ability to detect crosscontamination at any stage of the sequencing procedure. 11 2.3 Sample Indexing should be performed at the earliest possible stage of library preparation to allow subsequent detection of cross-contamination. The laboratory should avoid workflows that offer the potential for undetectable sample crosscontamination. Workflows that call for multiple manipulations, additions, and incubations of samples prior to index ligation or amplification increase the risk of undetectable sample to sample crosscontamination whereas workflows which add unique indexes to each sample early in the library preparation process provide a means to make cross-contamination detectable. 2.4 Laboratories should consider including identity SNPs within the assay to confirm patient identity. Identity SNPs’ could be included within each assay and interrogated with a second method to confirm patient identity, if no unique variants are identified within the genes analysed. These SNPs can also be used to monitor and detect any carryover contamination within the data. Where members of the same pedigree have been analysed, bioinformatics analyses to confirm family relatedness may also prove useful to highlight errors in specimen identification, processing or contamination. 3. Wet Workflow Validation 3.1 The genomic platform used must meet the specifications required for the diagnostic purpose and be operated in accordance with best practice as determined by the manufacturer. Consideration should be given to biases inherent in the platform of choice. Particular attention should be given to ensuring that any systematic weaknesses or errors of the sequencing system do not limit the diagnostic specificity of the assay, or that if such flaws exist, that orthogonal testing is employed to detect variants in regions of bias. Examples include regions of high GC content or repetitive regions. 3.2 Diagnostic laboratories should validate the operational performance of the wet laboratory workflow used in molecular diagnosis. Expansion of genomic methods for diagnostic applications makes it increasingly important to demonstrate data quality, reliability and reproducibility. Diagnostic laboratories should empirically determine their minimum requirements for data quality. Analytic sensitivity and specificity are important performance characteristics for genomic diagnostic applications. Diagnostic laboratories should document these aspects of the laboratory workflow by comparison of test results obtained under conditions defined above, to those obtained from a gold standard method (usually Sanger sequencing). 3.2.1 Resources • Aziz N, et al. (2014) College of American Pathologists' Laboratory Standards for NextGeneration Sequencing Clinical Tests. Arch Pathol Lab Med. • Gargis AS, Kalman L, Berry MW, et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol. 2012 30(11):1033-6. PubMed PMID: 23138292. 3.3 Diagnostic laboratories should regularly monitor the performance of the wet laboratory workflow used in molecular diagnosis. 12 Inclusion of known DNA control/standard samples at <10% of the pooled libraries at regular intervals would allow ongoing monitoring of assay performance and data analysis processes. 3.4 The use of outsourced platforms and services for diagnostic services should meet all of the standards outlined in this document If part of the genomic testing process is to be outsourced, NATA accredited providers or providers showing full compliance with NPAAC standards must be used. It remains the responsibility of the clinical laboratory to review, retain and furnish for audit all documentation related to clinical testing. 4. Sample Preparation 4.1 The laboratory should assess the quantity and quality of DNA samples before proceeding with diagnostic application. Failure to exclude samples of poor quality or insufficient quantity of amplifiable DNA can significantly affect the sensitivity and specificity of genomic diagnosis and lead to the possibility of false negative results. This is of particular significance where the sample type may be associated with limiting amounts of DNA, for example FFPE tissue or cell-free circulating DNA. Failure of sample exclusion can also affect turnaround time, due to the long cycle of the genomic testing process. In the case of measuring cell-free circulating DNA for the purposes of non-invasive prenatal screening or testing, the laboratory should have a process to ensure that adequate amounts of foetal DNA (i.e. in accordance with the sensitivity limit determined for the assay) are present in the sample prior to data analysis and interpretation of results. 4.2 Diagnostic laboratories should determine an appropriate range of DNA sample concentration and types to be included for an efficient test using genomic methods. Where appropriate, consideration should be given to including related affected and unaffected samples in the analysis. For example sequencing trios (proband and both parents) to confirm a de novo change, or tumour and normal samples to exclude a cancer variant as germline. 4.3 When only small amounts of tissue are available for somatic testing, the laboratory should determine the minimum specimen size and tumour proportion needed for successful analysis. Assessment of tissue volume and cellularity is usually estimated by microscopic examination by a competent person. Sufficient purity or proportion of targeted cells can then be achieved through macro-dissection. 5. Library Preparation 5.1 The laboratory should have an effective system to track the samples during the multiplestep process of library preparation. For laboratories handling in excess of 1000 samples per year, a Laboratory Information Management System capable of tracking a multistep workflow, with multiple samples, and QC steps should be considered. 5.2 The laboratory should have a quality control procedure to assess the adequacy of DNA fragmentation procedures. 13 For those laboratories that use protocols making use of DNA fragmentation, quality assessment of DNA fragmentation procedure is essential to ensure the right size distribution and accurate amount of fragmented DNA samples. The latter is critical for equal molar representation if multiple barcoded samples are to be subsequently pooled for library preparation. 5.3 The laboratory should undertake quality assurance measures during the validation phase to demonstrate that no significant allele bias or allele dropout occurs during target enrichment processes. The laboratory should determine the optimal conditions for library preparation. Documented metrics of performance of library preparation should be generated and used to QC library preparation steps on all clinical samples. For example, effect of input mass of DNA, fragmentation conditions, PCR cycles, etc. should be assessed. QC metrics in the form of Bioanalyser traces, spectrophotometric readings, or real-time PCR results should be produced and routinely collected and compared to those of an optimal validated run. 6. Template Generation 6.1 The laboratory should have a quality assessment procedure to assess the quality and quantity of a prepared DNA library used for template generation. An accurate estimation of DNA library quantity is essential for optimal clonal amplification. Quantification should be based on amplifiable templates (i.e. DNA fragments with proper ligated adaptors). For example, quantitative PCR (qPCR) has high levels of sensitivity and specificity and can accurately measure quantities of DNA. 6.2 The laboratory should have a quality assessment procedure to assess the adequacy of clonal amplification used for template generation. Quality assessment of the clonal amplification procedure is essential to ensure an adequate representation of DNA samples in the template. This is critical for equal representation if multiple barcoded samples have been pooled during library preparation. 7. Data Generation 7.1 The laboratory should establish empirically the coverage necessary for accurate detection of sequence variants and copy number changes, and provide the best estimation of false positive and negative rates. The laboratory should employ quality control measures that specify the quantity and quality of DNA sequence data to accurately differentiate all targeted sequence variants. This is especially critical when a multiplexed target enrichment procedure has been used to generate libraries The laboratory should ensure that there is sufficient coverage for the detection of aneuploidy e.g. in non-invasive prenatal trisomy 21 testing. 7.2 If multiple samples are to be sequenced simultaneously, the laboratory should have quality assurance measures to demonstrate that DNA sequence data generated cannot be attributed to the wrong sample. 14 Consideration should be given to the use of barcoded DNA samples and the possibility of sequence data being misdirected to the wrong specimen. 7.2.1 Resources • College of American Pathologists: Molecular Pathology Checklist 2012. Includes massively parallel sequencing. Part of a suite of checklists available for purchase online; not available separately 7.3 Data should be stored as required for diagnostic DNA studies. Consideration should be given what would be the suitable data format to keep (see further discussion in Bioinformatics Section). The raw reads and quality scores should be kept as a minimal requirement. Data storage should also comply with overarching regulatory and legislative requirements (see section in Ethical & Legal Issues.) 7.4 Any exception should be recorded for patient samples where steps used in the analytical process deviate from laboratory standard operating procedures. This exception log should be kept with the reason(s) for deviation and should retain links to the patient sample. 7.4.1 Resources • College of American Pathologists: Molecular Pathology Checklist 2012. Includes massively parallel sequencing. Part of a suite of checklists available for purchase online; not available separately • NPAAC: Requirements for the Retention of Laboratory Records and Diagnostic Material (Fifth Edition 2009). 8. Quality Control and Quality Assurance 8.1 The laboratory director should be able to identify the appropriate quality metrics that are suitable for their genomic tests. Consideration should be given to cross platform confirmation. Sanger sequencing should be considered to reduce false positive and/or negative rates, particularly in small indel variants. The limitation of genomic testing should be presented in the final report (See the details in the Reporting section). QC of sequencing data may include: • • • • • • • • Base call quality scores Read depth Uniformity of read coverage Read enrichment (for capture-based methods) Percentage PCR duplicates (for capture-based methods) Allelic Read Percentage GC bias Decline in signal intensity along a read 15 8.2 The laboratory should implement quality assurance measures that evaluate the entire process. Well-characterised DNA samples should be used as internal quality control samples. Cell lines are renewable, but may have some balanced or unbalanced chromosomal rearrangements. Blood samples from young subjects (<55 years) are typically free from such rearrangements, but have limited supply. Rearrangements that are identified may reflect the age of the donor or be a consequence of the culture process. Consideration should also be given to obtaining reference materials from overseas. For example, the Food and Drug Administration of the United States of American has recently completed the Sequencing Quality Control (SEQC) project, as a part of Phase III of the MicroArray Quality Control (MAQC-III) project. Its aims were to assess the technical performance of genomic platforms by generating benchmark datasets with reference samples, and to evaluate the advantages and limitations of various bioinformatics strategies in RNA and DNA analyses. Acceptable intra-and inter-run variability should be established during validation and monitored in diagnostic laboratories. It is important to determine assay precision, i.e., the degree to which repeated measurements give the same result – both repeatability (within-run precision) and reproducibility (between-run precision). Genomic technologies are rapidly evolving. Consideration should be given whether positive findings in genomic analysis should be confirmed by a different chemistry or a second method, particularly at the initial validation stage and for results that affect clinical decision-making. The laboratory should monitor, implement and validate upgrades to instruments, sequencing chemistries and reagents or kit used to generate genomic data. 8.2.1 Resources • Forsberg LA, Rasi C, Razzaghian HR, et al. Age-related somatic structural changes in the nuclear genome of human blood cells. Am J Hum Genet. 2012; 10;90:217-28. PMID: 22305530. Free PMC Article. • College of American Pathologists: Molecular Pathology Checklist 2012. Includes massively parallel sequencing. Part of a suite of checklists available for purchase online; not available separately • FDA: MicroArray Quality Control project • Roychowdhury S et al. Personalized oncology through integrative high-throughput sequencing: a pilot study. SciTransl Med 2011; 3:111-21 8.3 Laboratories performing diagnostic genomic testing should participate in suitable genomic proficiency testing or inter-laboratory sample exchange programs to meet the requirements for external quality assessment measures. Laboratories should establish a reportable range for each assay, such as multiple genes, exome and large genomic regions. 8.3.1 Resources • NPAAC: Requirements for Participation in External Quality Assessment (Fourth Edition 2009). • CDC: Next-generation Sequencing: Standardization of Clinical Testing (Nex-StoCT) Workgroup Principles and Guidelines. 16 CHAPTER THREE Bioinformatics 1. Introduction: 1.1 Scope Diagnostic applications of genomic testing span a wide range of approaches. These may include copy number analysis using DNA microarrays or resequencing of single genes (in high multiplex), gene panels, whole exomes, whole genomes, tumour profiling, non-invasive prenatal screening/testing, methylation analyses and RNA-Seq. The scope of this chapter is restricted to consideration of MPS technologies applied to clinical diagnostic DNA analysis. Excluded from scope are analyses of RNA, transcriptomes, epigenetic and methylation analysis and other applications of MPS. Issues addressed cover the range of MPS testing for genes, panels of genes, exomes and whole genomes. As the size and complexity of the analysis increases, additional procedures and safeguards may need to be included to ensure robustness and reliability of the analysis. 1.2 The Bioinformatics Pipeline A “bioinformatics pipeline” refers to a number of computational tasks, generally applied sequentially (hence the term “pipeline”), which receive at the beginning the output of an MPS sequencing instrument such as an image or FASTQ files, and progressively analyse this data through key steps, ending up with a VCF file, or even further with an annotated spreadsheet (CSV, TSV) or Text file. While there is no one standard pipeline, most bioinformatics pipelines convert the data through a series of fairly standardised milestones. A bioinformatics pipeline can be provided by the MPS instrument vendor, using proprietary software, or using open-source software. None of these approaches has been shown to be innately superior to the others, provided they are selected, tuned, validated or verified (as appropriate) and applied correctly. Primary analysis: This phase receives raw electronic information from the MPS instrument, and converts it using the vendor’s proprietary algorithms into genomic signals such as nucleotide positions and ordering (“base calling”). The laboratory usually has relatively little control of this phase as it is under the instrument manufacturer’s control. Where multiplexing strategies have been applied, de-multiplexing is performed at this analysis stage; de-multiplexing re-identifies the sample from which individual sequence reads were derived. For amplicon sequencing strategies, primers have to be trimmed from the reads. The outputs of the primary analysis phase are usually FASTQ files. Quality control (including machine metrics) and acceptance criteria should be applied at this stage. Secondary analysis: This phase receives the FASTQ files from the primary analysis, and maps (or aligns) it to the reference sequence and identifies changes from the reference sequence (variant calling). 17 The secondary analysis pipeline must be tailored to the MPS technical platform used. For example, duplicates arising from PCR strategies are typically marked for capture-/enrichment-based approaches where this strategy helps identify clonally-derived sequences and potential sequence artefacts. In contrast, PCR duplicates are not marked in amplicon-based sequencing strategies. Local realignment can optimise mismatches to increase accuracy and minimise false-positive variant calls. Variant calling is then performed to identify sequence variations from the reference such as SNVs and small insertions/deletions, copy number alterations and structural changes. The outputs of the secondary analysis phase are usually BAM and VCF files. There are a large number of commercial, academic and in-house tools in use for the secondary analysis of MPS data. Further quality control should be applied at this stage. Tertiary analysis: Tertiary analysis concerns the annotation of the identified sequence variants and may involve a combination of the following strategies: • • • • • • Comparison of the identified sequence variants to those reported in the most appropriate of the various polymorphism databases (e.g. dbSNP, dbVar, 1000Genomes, Exome Aggregation Consortium, Exome Variant Server) Annotation of the resulting transcript consequences (synonymous, truncating, missense, splice site etc.) Application of tools to predict the severity of the alteration, such as in silico pathogenicity prediction tools, splice site prediction tools, Grantham difference, assessment of sequence conservation, comparison to known protein domains Comparison to variants documented in clinical variant databases (e.g. ClinVar, HGMD, OMIM, LOVD, DECIPHER) and locus- and disease-specific databases Review of functional data relevant to the variant/locus, including gene expression data, in vitro, and in vivo studies Research of the variant/gene published in peer-reviewed literature For large-scale genomic investigations, such as expanded gene panels, whole-exome or wholegenome analysis, tertiary analysis further involves a process of variant filtering and prioritization, by removal of findings of lesser interest. The aim of variant filtering and prioritization is to reduce the number of candidate variants to those most-likely associated with disease. For genome-scale investigations, variant filtering and prioritization is typically performed in a (semi-)automated fashion. The resulting pre-filtered set of candidate variants is then manually reviewed in further detail to allow clinical interpretation and classification of the sequence variants, and to take into account the current limitations of annotation databases; clinical interpretation and reporting of findings are discussed in chapter 5. The outputs of annotation and filtering phases commonly are annotated VCF or CSV/TSV (spreadsheet) files. Further quality control system standards can be applied at this stage. 2. Documentation Comment: Laboratories have a choice of using vendor-supplied pipelines, open-source pipelines, or some combination of both. In general, less documentation is required for vendor-supplied pipelines, but more customisation and fine-tuning is possible for in-house developed or applied software. The requirements described in this section apply regardless of the source of the bioinformatics pipeline. 2.1 The laboratory must document all components of, changes to, and auditing of the informatics pipeline. 18 The laboratory must document all components of the informatics pipeline, including software packages, custom scripts and algorithms, reference sequences and databases. Any changes, patch releases or updates in processes or version numbers must be documented with the date of implementation such that the precise informatics pipeline and annotation sources used for each test and report is traceable. If information from public websites is used, the date of access should be documented. 2.2 The laboratory must use version control to track software releases and updates to analysis methods. The laboratory may consider use of dedicated version control software to assist with this requirement for managing software code, such as Concurrent Versions System (CVS), Apache Subversion (SVN), or Git. There are also dedicated software tools for management and control of laboratory method documents and validation records. 2.3 The laboratory must document the quality metrics assessed during a test. For the informatics pipeline, relevant quality metrics include but are not limited to: the total number of reads passing quality filters, the percentage of reads aligned, the number of single nucleotide polymorphisms (SNPs) and insertions and deletions (indels) called, and the percentage of variants in dbSNP. 2.4 The laboratory must document the results of the pipeline validation. The validation documentation must detail the performance of the pipeline such as the sensitivity, specificity and accuracy of the pipeline to detect variants and any limitations of the pipeline. The validation document must be readily available to staff involved in MPS based genetic testing. 2.5 The laboratory should document all training and staff qualifications. Given the rapid advances in bioinformatics, laboratories implementing NGS-based assays need to consider appropriate staff training and ongoing professional development of staff in bioinformatics. Staff involved in the reporting of NGS results must have, as a minimum, an understanding of the bioinformatics analysis steps and resources used for annotation. 2.6 The laboratory must document the process of data handling and storage. The laboratory needs to define the minimum set of data to store. Typically, this will involve storage of .bam, .vcf files but not image files. Alternatively, the laboratory may store .fastq files to allow reanalysis of the primary data. Interpreted variant call files, such as those after review of the initial calls must also be stored. 2.7 The laboratory must define and document the conditions for data reanalysis. As our understanding of sequence variation expands and our bioinformatics tool set improves, it may be necessary to re-evaluate the annotation of a variant or to re-analyse the sequence data. The laboratory must specify under which circumstances, if any, such reanalysis is to be performed. 3. Validation The general principles of validation of laboratory tests (IVDs) (see NPAAC Requirements for the Development and Use of in-house in-vitro Diagnostic Devices - 2014) also apply for MPS assays. These include design, production, technical validation, and monitoring /improvement, and documentation requirements. However, that document does not address aspects specific to 19 genomics and MPS, which is covered in greater depth in resource documents such as Clinical Laboratory Standards Institute. MM09-A2: Nucleic Acid Sequencing Methods in Diagnostic Laboratory Medicine; Approved Guideline - Second Edition (February 2014) and in Gargis et al. (2012). Risk of errors in bioinformatics pipeline: In an analysis pipeline for identification of sequence variants, one must have high confidence that the resulting variant calls have high sensitivity and specificity. Although true positives (TP) can be distinguished from false positives (FP) easily through external validation, it is almost impossible to systematically distinguish false negatives (FN) from the vast number of true negatives (TN). Different pipelines may vary widely in their degree of concordance of classification of findings (e.g. O’Rawe et al. 2013), with the risk of false negative rate being particularly difficult to address, especially with indels compared to SNVs. The majority of differences between variant calling pipelines appear, however, in ‘problem regions’ of the genomes, such as repeat sequences, regions of sequence homology elsewhere, low complexity regions and regions with errors in the reference assembly; the concordance between calls can often be further improved by applying post-variant calling filters to remove artefactual calls (Li et al Bioinformatics 2014 PMID: 24974202). Besides variant calling, the use of different variant annotation software programs and transcript annotation files can also make a substantial difference in annotation results that are not commonly appreciated (McCarthy et al. 2014). These troubling reports highlight the needs to ensure bioinformatics pipelines are subjected to rigorous validation and QC, especially for clinical diagnostic applications. 3.1 Design of validation study 3.1.1 The validation study must be designed to provide objective evidence that the bioinformatics pipeline is fit for the intended purpose. Validation is the process of measuring the performance characteristics of a bioinformatics pipeline, and ensuring that the pipeline meets certain pre-defined minimum performance characteristics before it is deployed. 3.1.2 The validation study must identify and rectify common sources of errors that may challenge the analytical validity of the bioinformatics pipeline. As part of the validation study, it is important to gain an understanding of common error sources that may compromise the validity of the pipeline, such as: ● Inherent limitation of individual programs ● Inadequate optimization of parameters of individual programs ● Problems with data flow between individual programs ● Use of incorrect auxiliary files (e.g. wrong human genome reference) ● Hardware or operating system failure 3.1.3 The validation study must establish the analytical validity of the bioinformatics pipeline in terms of being able to correctly detect sequence variants (secondary analyses) and correctly annotate sequence variants (tertiary analyses). Analytical validity refers to the ability of a bioinformatics pipeline to correctly call and annotate a variant. Analytical validity must be achieved before clinical validity can be established. 20 Clinical validity refers to the ability of a test to detect or predict a phenotype of interest. Clinical validity must be established by external knowledge such as results from large-scale population studies or functional studies (Refer to chapter on Reporting). 3.1.4 The laboratory must validate the entire bioinformatics pipeline as a whole, under the given operational environment. A laboratory may choose to put together its bioinformatics pipeline using any combination of commercial, open-source, or custom software. Regardless of whether an individual component has been validated, the laboratory is still required to validate the entire bioinformatics pipeline under their operational environment (i.e., same hardware specification, same operating system, same parameter setting, and same input load). 3.1.5 The validation study must be designed to avoid bias caused by testing on training data. It is important to ensure that quality metrics were measured on reference materials that have not been used for tuning (training) the parameters of any of part of the pipeline. The use of training data as testing data may lead to artefactually inflated measurement of various quality metrics. 3.2 Validation process 3.2.1 The laboratory must determine standardised performance metrics of the pipeline. The use of standardised performance metrics ensure that validation results could be communicated and compared unambiguously. Some commonly used performance metrics are: • The frequency of True Positive, True Negative, False Positive, and False Negative results • Accuracy • Precision • Sensitivity • Specificity • Reportable range • Reference range • Limit of detection The usefulness of these metrics depends on testing on a diverse collection of Reference Materials in an environment that realistically simulates the real operational environment. Depending on the performance characteristics of the analytical system, it may be necessary to use replicate analyses or duplicate samples to achieve satisfactory technical reproducibility. 3.2.2 The validation study must define valid ranges for commonly assessed quality metrics. We generally do not know the correct answer associated with an input FASTQ file, except for the case of reference materials. Nonetheless, based on the results of RM and other previous experience, it is possible to establish some general statistics that we could expect from a valid pipeline. For example, the return of the expected number of variants from WES data set (generating 10,000 – 50,000 variants) can be checked. The transition/transversion ratio (Ti/Tv) can also be determined to fall within a defined range. Deviation from these pre-defined ranges may indicate a necessity for closer examination, but does not automatically imply a validity problem. 21 3.2.3 Acceptability criteria must be defined to describe clearly the minimum quality metrics required to demonstrate the bioinformatics pipeline is fit for purpose. One way to demonstrate acceptability and fitness for purpose is to undertake proficiency testing carried out by a NATA accredited (or international equivalent) third party using a different set of Reference Materials. 3.2.4 The laboratory must benchmark the bioinformatics pipeline using reference material, where available. The reference materials chosen must be appropriate for assessing performance of the pipeline for its intended purpose. Validation of a bioinformatics pipeline generally involves executing it given some input data where the correct status of the variant is known. These input data are called Reference Material (RM). The usefulness of a RM depends on obtaining a large variety of input, from sequence containing only simple SNV to sequences containing complex indels. RM can be generated entirely by in silico simulation, or sequencing real oligonucleotides of known sequences. Note that for the purposes of specific bioinformatics Quality Assurance, this RM may consist of well characterised data sets (e.g. FASTQ files), rather than physical materials such as DNA samples. It is possible to obtain a large variety of RM from in silico simulation. Nonetheless, RM from real sequences should also be employed as they likely better capture characteristics of real data. Examples of bioinformatics reference materials are consensus variant calls distributed by the Genome in a Bottle Consortium for NA12878, e.g. accessible on the Genome Comparison and Analytics Testing website (http://www.bioplanet.com/gcat) and consensus calls distributed under the Illumina Platinum Genomes initiative (http://www.illumina.com/platinumgenomes/). Both of these datasets include a consensus set of calls from multiple pipelines to allow identification of pipeline-specific artefacts. 3.2.5 The laboratory should compare the results from multiple pipelines, where possible, to allow identification of pipeline-specific artefacts. Multiple pipelines could generate quite different variant calling results from the same input FASTQ file. One strategy to validate a pipeline is to measure the concordance between the results of a given pipeline against several other widely used pipelines. High concordance does not necessarily guarantee correctness, but low concordance indicates problems. Poor concordance commonly overlaps with ‘problem regions’ of the genome, e.g. low complexity regions, as discussed above. Any limitations of the chosen pipeline must be defined as part of the validation study. 3.2.6 The validation study must establish appropriate error handling within the pipeline. A bioinformatics pipeline could fail due to the corruption of an input file generated by primary analysis or intermediate steps within the pipeline. It could also fail due to excessive load on the server or interrupted network connection. As part of the validation procedure, it is important to assess whether the pipeline can detect corrupted files or interrupted execution, and generate appropriate error messages. 3.2.7 The validation study must establish appropriate hardware and operating system environments to allow successful execution of the pipeline. The bioinformatics pipeline can be executed in a dedicated computer server, a shared high performance computing (HPC) environment, or the cloud. The successful execution of these programs also depends on the use of appropriate operating system, appropriate auxiliary software program, and supporting reference files (e.g., the human reference genome file, and gene annotation file). Validation should be conducted in a system that closely resembles the actual operational environment. See also issues raised in section 5 of this chapter. 22 3.2.8 When changes are made to the test system, the laboratory must demonstrate that acceptable performance specifications have been met before using the changed test system for clinical purposes. 3.2.9 The laboratory must define the limitations of the informatics pipeline. Common limitations of the bioinformatics pipeline include but are not limited to: the maximum size of indels detectable, regions of poor mapping and/or excessive read depth, regions of poor sequence coverage, repeat regions and homopolymer sequence regions that may affect variant calling. There may also be specific limitations of individual specimens that can affect the capability of a given bioinformatics pipeline. 4. Quality Control and Quality Assurance Quality control (QC) of sequencing data vs. QC of the bioinformatics pipeline: It is important to distinguish QC for checking the quality of sequencing data, and QC for ensuring the correct execution of the bioinformatics pipeline. Data QC is important for checking whether the sequencing data is of sufficient good quality to ensure variant calling can be performed to the required standard. On the other hand, pipeline QC is concerned about whether the bioinformatics pipeline has been correctly executed according to the predefined quality metrics for a given sequencing data input. Both types of QC are important. QC of bioinformatics pipeline may include the following metrics: • • • • • Mapping quality Transition/Transversion ratio Presence of duplicate reads Expected number of variants Expected percentage of known variants (e.g. variants in dbSNP) 4.1.1 The laboratory must monitor quality metrics and acceptability criteria of the informatics pipeline established during pipeline validation. Quality metrics are to be recorded for each test performed and interpreted in the context of the acceptability criteria that were defined during pipeline validation. 4.1.2 Deviation of achieved quality metrics from defined acceptability criteria must be investigated and mitigated. Significant deviations may require repeat of the test. For example, a deviation in the percentage of SNPs in dbSNP observed may indicate a problem with variant calling for that sample. 4.1.3 Quality metrics and acceptability criteria must be reviewed regularly to ensure relevance to current test performance. Revalidation must be performed where ongoing deviations are observed and/or substantial changes to the informatics pipelines have been made. Choice of appropriate quality metrics can be of significant help in troubleshooting the source of the problem in an underperforming test. Trend analysis of bioinformatics quality metrics may also prove to be useful. The appropriateness of the chosen quality metrics to monitor test performance needs to be reviewed regularly, and at least annually. 4.2 Confirmatory processes 4.2.1 The laboratory must define the policy for confirmation of reported variants. 23 The policy must include a statement as to the circumstances, if any, under which clinically actionable findings are to be confirmed by use of an orthogonal technology. For example, this may involve resequencing using Sanger sequencing, or using a second, different MPS technology, or applying an independent or different technique (such as protein, enzyme or functional assay). Confirmation of the results in an independent sample with the same assay may be considered in an effort to minimise stochastic effects. The circumstances may depend on the nature of the test request, the performance characteristics of the assay (in particular the defined accuracy of the test), and the intended use of the reported result. 4.2.3 The laboratory should consider use of multiple independent software tools to establish consensus calls or for confirmation of calls. Depending on the accuracy of individual software tools, establishing consensus of multiple tools may significantly improve the accuracy of the prediction. The policy for use of multiple software tools and the confirmation of calls should be established during pipeline validation. 4.3 Quality assurance 4.3.1 The laboratory must participate in QAP programs for the analysis and interpretation of DNA sequence variants, where such programs are available. Example of QAP programs include those organised by the RCPA and the EMQN network. Currently, programs for MPS analysis are in pilot phases. 4.3.2 The laboratory should consider the use of reference materials for ongoing monitoring of test performance. For example, alignment and variant calling pipelines can be validated and monitored using the Genome in a Bottle, Coriell NA 12878, Illumina Platinum Genomes or similar reference materials. 4.3.3. The laboratory should establish the local process for proficiency testing. Proficiency testing may involve an external QA program, sample exchange, use of electronic sequence files, reference materials and other approaches. 5. General Informatics Aspects This section refers to general issues that are applicable in all circumstances and environments. Where a laboratory uses off-site or hosted facilities (including “cloud” facilities), these requirements must be met for all stages of the process, including those not physically co-located or under the direct control of the laboratory. 5.1 Data security and privacy 5.1.1 The laboratory must ensure that data management meets requirements for data integrity and security including avoidance of tampering with primary data files and/or corruption of result files. MPS data may involve the management of very large data files (in excess of hundreds of Gb) on shared compute resources. Strategies need to be put in place to ensure the integrity of data files is maintained (e.g. use of checksum tools during file transfer, management of data permissions and ‘write’ access rights) and that a secure copy of the primary data files (FASTQ) is maintained elsewhere from ‘working copies’ which allows regeneration of results files (BAM, VCF, annotations), if this should be required. 24 5.1.2 The laboratory must use structured databases wherever possible. The use of spreadsheets or text files to store information is discouraged as these typically don’t allow satisfactory traceability or auditing of changes made. 5.1.3. The laboratory must ensure that data management meets the requirements for protecting patient privacy and autonomy. General requirements for privacy as they relate to the practice of pathology can be found in the NPAAC Standards: Requirements for Medical Pathology Services, and Requirements for Information Communication. Patient autonomy here relates to a patient’s wish of learning or not of incidental findings that may arise in the course of testing and the general scope of testing to which the patient consented. Data management strategies should consider the masking of information that is outside the scope of testing for a given patient sample. This may involve masking of loci other than those targeted for analysis in a given patient. Masking may be performed at any stage during the bioinformatics analysis pipeline, but must be performed prior to providing annotated variant calls for review to a laboratory scientist to ensure the scientist is not exposed to information outside the scope of testing. 5.2 Data storage and backup 5.2.1 The laboratory must establish a procedure for the storage and backup of data with particular reference to the management of raw sequence data, primary, secondary, and tertiary analysis files. The data files to be stored long-term must be identified. 5.2.2 The laboratory must ensure adequate data storage and backup capacity is available. For MPS data this may require Tb of storage to accommodate primary and secondary analyses files. Network speed to manage data transfer and access also needs to be considered. 6. Resources ● ● ● ● ● ● ● ● ● ● ● ● PHG Foundation (2011): Next steps in the sequence. The implications of whole genome sequencing for health in the UK. Gargis et al, Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol. 2012 Nov;30(11):1033-6. Opportunities and challenges associated with clinical diagnostic genome sequencing: a report of the Association for Molecular Pathology. Schrijver et al. J Mol Diagn. 2012 Nov;14(6):52540. Vihinen, Guidelines for Reporting and Using Prediction Tools for Genetic Variation Analysis. Hum Mutat 34:275–277, 2013. College of American Pathologists: Molecular Pathology Checklist 2012. Includes massively parallel sequencing. Part of a suite of checklists available for purchase online; not available separately. Pabinger et al, A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2013 Jan 21. PubMed PMID: 23341494. Analysis of in silico tools for evaluating missense variants, A summary report. National Genetics Reference Laboratory, Manchester. 2012. Best Practice Guidelines for the Use of Next Generation Sequencing (NGS) Applications in Genome Diagnostics: A National Collaborative Study of Dutch Genome Diagnostic Laboratories. Hum Mutat. 2013 Jun 17. PubMed PMID: 23776008. EuroGentest, Guidelines for diagnostic next generation sequencing. 2014. NPAAC standard: Requirements for the Retention of Laboratory Records and Diagnostic Material. NPAAC standard: Requirements for Medical Pathology Services. NPAAC standard: Requirements for the Information Communication. 25 ● Clinical Laboratory Standards Institute. MM09-A2: Nucleic Acid Sequencing Methods in Diagnostic Laboratory Medicine; Approved Guideline - Second Edition (February 2014) Validation (generic): • • • • • • • • • • • • • • • • • • • • • Jennings et al. Recommended Principles and Practices for Validating Clinical Molecular Pathology Tests. Arch Pathol Lab Med—Vol 133, May 2009 Mattocks et al. A standardized framework for the validation and verification of clinical molecular genetic tests. EJHG 2010. NPAAC: Requirements for the Development and Use of In-House In Vitro Diagnostic Medical Devices (Third Edition 2014) Validation (secondary analyses): Linderman et al BMC Medical Genomics 2014. 7:20. Analytical validation of whole exome and whole genome sequencing for clinical applications Cornish and Guda. BioMed Research International. A comparison of variant calling pipelines using genome in a bottle as a reference. Heinrich et al. The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process. Nucleic acids research. Meynert et al. Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinformatics 2014. Zook et al. Integrating human sequence data sets provides a resource of benchmark SNSNP and indel genotype calls. Nature Biotechnology. 2014. Meynert et al. Quantifying single nucleotide variant detection sensitivity in exome sequencing. BMC Bioinformatics. 2013. Chin et al. Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations. BMC Genetics 2013. Pirooznia et al. Validation and assessment of variant calling pipelines for next-generation sequencing. Human Genomics. 2014. O’Rawe, J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5, 28 (2013). Validation (tertiary analyses): Walters-Sen et al. Variability in pathogenicity prediction programs: impact on clinical diagnostics. Molecular Genetics and Genomic medicine. 2014. McCarthy, D. J. et al. Choice of transcripts and software has a large effect on variant annotation. Genome Med. 6, 26 (2014). Guidelines (tertiary analyses/ annotation): ACGS Practice Guidelines for the Evaluation of Pathogenicity and the Reporting of Sequence Variants in Clinical Molecular Genetics. ACMG Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine. CMGS Practice guidelines for Targeted Next Generation Sequencing Analysis and Interpretation. Other: College of American Pathologists’ Laboratory Standards for Next-Generation Sequencing Clinical Tests. doi: 10.5858/arpa.2014-0250-CP 26 CHAPTER FOUR Reporting 1. Introduction The goal of a genomics report is to convey accurate, interpretable and succinct information that is relevant to patient care. In the Massively parallel sequencing era, this simple statement is becoming increasingly difficult to put into practice. This chapter aims to provide guidelines and establish principles that should assist in the preparation of a genomics report. Approaches to genomic analysis vary in terms of the technology and methodology used as well as the breadth of genetic variation that is interrogated; the analysis may yield information about a single class of genetic variation or may extend to encompass all sequence and structural variants. This presents a number of challenges for the clinical laboratory when preparing a report based on Genomic data. The issue of clinical validity and utility is an important concept to keep in mind when formulating the report, although addressing this is beyond the scope of this document, and is already well covered by existing legislation in NPAAC standard publications (validation of in-house IVDs and nucleic acid testing). A key issue in reporting genomic tests is that variants of known or possible pathogenicity may be identified which may be unrelated to the primary clinical indication for the test. Such incidental findings are inevitable in high-resolution genomic studies utilising Massively Parallel Sequencing techniques which interrogate a greater proportion of the human genomic sequence, presenting difficulties both for laboratories in reporting such variants and for clinicians receiving unsolicited information. The potential for false positive results is also amplified by the increasing number of genes interrogated, and the low prevalence of some of the disorders which may be the subject of investigation by genomic sequencing. A second key issue is that a large number of variants of uncertain clinical significance can be identified and the reporting of this information requires careful management to minimise potential harm while providing the maximum available, relevant information, for clinical management. It is essential that laboratories producing reports of genomic tests have clearly defined, evidence based protocols for classifying the clinical significance of detected genetic variants and addressing incidental findings. This protocol should define, prior to the analytical result being available, which outcomes will be reported and which will not and that these protocols are available to requestors of Genomic tests. It is also essential that these results are reported clearly, consistently and unambiguously, using established nomenclature guidelines such as those available from the Human Genome Variation Society (http://www.hgvs.org/mutnomen/) and relevant standardised reporting formats such as the RCPA Guidelines for reporting molecular genetic tests to medical practitioners 2009. It must be recognised that laboratory reports may be read by both experts and non-experts, and may be stored for years in a patient’s medical record. This chapter provides a guide for the reporting of NGS results in the clinical context. It has been developed in the interests of ensuring the analytical and clinical validity of genomic reports, the consistency and clarity of reporting, thereby assisting in the production of a report that is accurate, interpretable, succinct and relevant to patient care. 2. Published Guidelines A number of national and international professional bodies have issued policy statements regarding the clinical application of genomic sequencing that include guidelines for the reporting of genomic 27 testing. It is advised that these are consulted to provide a broader overview of the issues relating to the reporting of genomic data. Published Guidelines for the Clinical application of Genomics that include recommendations on the reporting of massively parallel sequencing data: • • • • • • • • • • • EuroGentest 2014 Rehm HL et al. Working Group of the American College of Medical Genetics and Genomics Laboratory Quality Assurance Committee. ACMG clinical laboratory standards for nextgeneration sequencing. Genet Med. 2013;15(9):733-47 van El CG et al. Whole-genome sequencing in health care. Recommendations of the European Society of Human Genetics. Eur J Hum Genet. 2013 Jun;21 Suppl 1:S1-5 Brownstein CA et al. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol. 2014;15(3) Weiss, Van der Zwaag et al. Best practice guidelines for the use of next-generation sequencing applications in genome diagnostics: a national collaborative study of Dutch genome diagnostic laboratories. Hum Mutat 2013; 34:1313-21 College of American Pathologists’ Laboratory Standards for Next-Generation Sequencing Clinical Tests Arch Pathol Lab Med. Association for Clinical Genetic Science (ACGS) Practice guidelines for Targeted Next Generation Sequencing Analysis and Interpretation Scheuner MT, Hilborne L, Browne J, Lubin IM, et al. A report template for genetic tests designed to improve communication between the clinician and the laboratory. Genet Test Mol Biomarkers. 2012 Jul;16(7):761-9. Lubin IM, Caggana M, Constantin C, Gross SJ, Lyon E, Pagon RA, et al. Ordering molecular genetic tests and reporting results: practices in laboratory and clinical settings.J Mol Diagn. 2008 Sep;10(5):459-68. Lubin IM, McGovern MM, Gibson Z, Gross SJ, Lyon E, Pagon RA, Pratt VM, et al. Clinical perspective about molecular genetic testing for heritable conditions and development of a clinician friendly laboratory report.J Mol Diagn. 2009 Mar;11(2):162-71. CLSI standard MM09-A2 Nucleic Acid Sequencing Methods in Diagnostic Laboratory Medicine (second Edition Feb 2014) 3. The Clinical Context of Testing 3.1 The laboratory should have explicit requirements regarding the level and quality of clinical information that must be provided when a genomic test is requested. The Massively Parallel Sequencing Genomics report must include the context in which testing has been requested. This assists with correct interpretation of the report, and re-statement of the clinical context for testing is currently an NPAAC requirement. Detailed and informative clinical information is also critical for the laboratory to produce a report that presents its conclusions in the most appropriate clinical context. The clinical interpretation of genomic analyses are improved when testing laboratories are provided with discriminating clinical details. Bioinformatics pipelines are capable of providing interpretative comment on identified variants. Reporting laboratories should ensure that they understand the limitations of informatics interpretations and provide adequate review of automatically generated interpretations in the clinical context of testing. 3.2 The laboratory should maintain active dialog with requesting clinicians regarding requests, interpretation, and reporting of genomic tests. Strengthening laboratory liaison with requesting clinicians is essential in any rapidly developing field of medical testing. The interpretation and reporting of genomic test results would benefit from a team 28 approach, whereby clinical laboratory scientists, pathologists, clinical geneticists, and other medical professionals are involved in genomic data interpretation. This liaison should extend from requesting of the test and defining the most appropriate genomic testing through to interpretation of the report. Astute clinical assessment and laboratory-clinician consultation can focus genomic analysis on specific genomic regions, potentially enabling diagnosis of monogenic disorders, or clarify differential diagnoses. 4. Variant Reporting 4.1 The laboratory should consistently classify genomic variants according to their clinical significance. Variants that are classified as benign or even likely benign could potentially not be included in a report, with reports confined to variants classified as pathogenic or likely pathogenic (EuroGentest 2014 final draft; ACMG Standards and Guidelines 2015). Local policy can dictate which variant categories are to be reported; however, a record of all variants identified should be maintained by the testing laboratory and should be readily accessible for review and may be disclosed upon request to a clinician. The report should clearly state the laboratory's reporting policy indicating which classes of variants have been reported, highlighting the possible existence of variants which may not appear on the report. The classification of a variant as benign or pathogenic must be based on a secure evidence-base such that significant reclassification in the future is unlikely without additional and convincing functional data. However, the observation of a variant of unknown significance may require a fresh evaluation of that variant to reflect new information that may be available. To ensure consistency, it is highly recommended that a reporting laboratory maintain an in house database of variants that is professionally curated to a standard acceptable for clinical use (RCPA Clinical database standards document reference) and submits variant data to a clinical standard external database. 4.2 The laboratory should have a clear policy regarding the reporting of variants and this policy must be readily available to referrers. Irrespective of how genomic variants are classified, there is likely to be a substantial number of variants of ‘unknown significance’ for which there is no relevant evidence to assist interpretation, i.e. there is no evidence base on which to determine clinical significance for the condition under investigation. Reporting laboratories should be aware that there is a possibility of potential over-interpretation of results of uncertain significance based on a limited understanding of contextual information. As such, reporting laboratories must minimise the potential for readers of genomic test reports to misconstrue the clinical significance of certain clinical categories of genomic findings and the ensuing anxiety/harm that this may bring to patients (and their families). The report should be transparent in how it has reached its conclusions, and include information relating to how the findings may influence subsequent clinical judgment, including suggestions for further testing, if necessary. 4.3 Assessing the Available Evidence The laboratory’s interpretation of clinical significance should be based on sound evidence. Peerreviewed literature and clinical or near clinical quality databases could be regarded as high quality primary evidence in the assessment of clinical significance for a particular genomic variant. The RCPA Standards for Clinical Databases of Genetic Variants document is a useful guide for measuring 29 the quality of a clinical database. Functional studies on genetic variants are occasionally performed by clinical laboratories to evaluate clinical significance. Examples include RNA studies to determine the effect of a variant on RNA splicing events, and protein functional studies to determine protein activity. Caution must be exercised with in-house studies which have not been subject to peer review processes to ensure that appropriate controls are included and that the results are analytically valid. . Furthermore, in assessing protein function, care must be taken in determining which functions of a multi-functional protein are relevant to the disease state and therefore should be assessed. When evaluating literature the quality of the publication should be taken into account. Critical review of literature cited in reports is a requisite competency skill for any laboratory geneticist. This should include consideration of statistical significance requirements in case/control and comprehensive familial series. When utilising genomic databases consideration should be given to the purpose of each database and the processes used for the classification of variants submitted to the database. It should be noted that merely appearing in a database with a classification of pathogenic does not constitute final proof that a particular variant should be reported as pathogenic in the clinical context of the report. It is more likely that a database entry will provide the starting point for the collection of evidence of proof of pathogenicity and that multiple databases and sources will be required before arriving at this conclusion. In general, classifications should be based on multiple independent lines of evidence such as in vitro or in vivo functional data, segregation with disease, algorithmic prediction of protein function or RNA transcription events and disease/normal population variant frequencies. Care should be taken to avoid over-interpreting early reports showing enrichment of a variant in “affected” populations, particularly those that have not been verified in replication studies. This should also apply to variants found to confer low relative risk and predictive power for common (multifactorial) diseases and traits. 4.4 Variant Interpretation 4.4.1 The laboratory professionals who provide clinical interpretations of genomic variants should understand, have access to, and utilise up-to-date resources to aid them in their task. 4.4.2 The laboratory should develop a protocol for the assessment of variant pathogenicity. See above section 4.1 Novel and rare genetic variants pose the greatest interpretational challenge owing largely to the continued lack of high-quality, large-scale control data. Additionally, there is often a lack of information on rare variants for evidence based assessment. The task of assessing seemingly novel or rare variants is challenged by the accelerating pace of discovery of rare variants associated with phenotypic abnormality, as well as a growing number of clinically significant variants recognised to have incomplete penetrance and variable expression. The potential influence of genetic background, in particular modifier sequence variants is a confounding factor and should be considered where there is any evidence to suggest there may be a modifier effect. Report writers should be aware of the possibility that the initial diagnosis arrived at in the laboratory might be incomplete and that additional clinically significant genetic variants (that have gone undetected or remain unclassified due to lack of an evidence base at the time of reporting) may underlie the patient’s non-specific or variable phenotype. The laboratory geneticist should not over-interpret the genomic analysis result, especially where a variant of ‘uncertain’ significance is concerned, in recognition that these findings do not necessarily indicate a diagnosis for the patient. Reported assertions regarding variant pathogenicity should ideally 30 include any mitigating clinical information such as the inheritance pattern, clinical context, and phenotype. Analysts should also provide insight into the extent of what remains unknown and the challenge this brings to the task of clinically interpreting genomic findings. The report should include or cite the evidence which justifies the conclusion regarding the clinical significance of identified genomic variants. 4.5 Systematic Review of Variant Interpretations 4.5.1 The reporting laboratory must have a written protocol for the review of variants and this protocol must be available to referrers. Interpretation of the clinical significance status of a genomic variant may change in the light of new information. A key question requiring consideration is how (validated/issued) reports that require modification in light of new information, with their attendant clinical risks, should be dealt with. In practical terms, responsibility for a specific patient lies primarily with the physician with an ongoing patient relationship. Recommendations that call for continued review of variants (including ‘benign’, ‘likely benign’, ‘unknown’ and ‘uncertain’ significance findings), which collectively may number in the tens of thousands per sample in the case of whole exome sequencing and whole genome sequencing, have significant resource implications for laboratories, particularly for professional time. Scheduled review based solely on defined time intervals also raises the possibility of re-issuing clinically inappropriate reports performed in isolation from the referring clinic. It is advised that laboratories have a formal process for evaluating new evidence, re-interpreting previous, individual patient results, re-contacting referrers, and contributing to patient reviews, where required. However, a bi-directional flow from the clinic through continued review of patient files can also contribute to the timely review of variants. 5. Incidental Findings Genomic analysis will inevitably detect clinically significant variants, which are unrelated to the clinical features that prompted testing. The issues associated with detection of these remain under discussion, but their solutions will no doubt involve an emphasis on counselling and education before testing is performed, informed consent with a clear explanation of the current limits of testing and interpretation, maintenance of privacy and confidentiality, and sensitivity to culture within families, their heritage, and their communities. For further information see Section 1: Ethicsl and Legal issues. Commonly encountered examples of unsolicited findings detected during genomic testing include: • • • Detection of consanguinity and incest, where this was not known to ordering clinicians and families; Detection of carrier status for autosomal recessive disorders unrelated to the clinical indication for genomic testing; Detection of variants involving highly penetrant genes associated with dominant, adult-onset conditions. 5.1 Laboratories performing genomic testing should have clear policies in place for disclosure of incidental findings. 31 It is advisable that clinicians and patients be informed of these policies and the types of incidental findings that will be reported. Clinicians may give patients the option of not receiving certain results. While these policies should be in place, exceptional circumstances may arise which need to be handled judiciously on a case-bycase basis through laboratory-clinician consultation. In keeping with the principles of good laboratory practice, uncertainty associated with reporting incidental findings is usually best managed with input from a medical genetic specialist and/or the referrer. Reporting of incidental findings should be limited to variants that are unequivocally classified as pathogenic or likely pathogenic and are deemed reportable according to the laboratory’s policy. 6. The Genomic Test Report Clinical Genomic reports should follow the general principles for reporting of genetic tests as described in the RCPA guidelines (Guidelines for reporting molecular genetic tests to medical practitioners, 2009) https://www.rcpa.edu.au/getattachment/3cb35802-3cfa-49fa-9d2f4f199501c551/Guidelines-for-reporting-Molecular-Genetic-Tests.aspx and be consistent with ISO 15189 as a minimum requirement. However, the genomics report is likely to need to convey a much larger amount of information both in terms of test data and in terms of descriptive information about the test and particularly its limitations. The challenge is to produce a report that contains all the relevant information for accurate interpretation of the report while avoiding information overload and possible distraction from the actual result required for patient care. One approach to solving this problem is to provide a report with a prominent one page patient summary containing all class 4 and 5 variants relevant to the request, that is supplemented with further information relating to the actual test (limitations, bioinformatics pipeline description and metrics) and variants other than class 4 and 5 if appropriate to the request. Such multi page reports would need to comply with all relevant standards and guidelines for reporting including page number, report date and the inclusion of patient demographics on each page to ensure unambiguous linking of the entire report. A consistent approach to reporting genomic findings is important, particularly for families dispersed across state or national boundaries. Crucially, those responsible for reporting should appreciate that interpretive difference may influence medical management and patient choices. Even if a report is directed to the expert requesting clinician, it is also important to note that reports may be included in medical records and hence be read by non-experts involved in the patient’s care. Hence, every effort should be made to ensure that the report is succinct, clear and interpretable by as wide a range of relevant clinicians as possible. 6.1 Key requirements of a Genomics Report The minimum suggested content for a report is described below. The following list is not a recommendation for the structure of the report which could be ordered to provide a one page summary of important results relevant to patient care followed by additional pages detailing the test and any further results that may be appropriate. Report Details • • • • Reporting laboratory details Title of report Report status and Report authorisation Issue date and time of report 32 Patient identification • • • • • Name Date of Birth Unique laboratory identifier Gender Ethnicity if relevant to testing Patient diagnosis context • • • • Clinical details on request Specimen Type (blood, tissue and site, fluid) Secondary specimen identifier (Block number, referring laboratory identifier) Test description • • • • • Test category Purpose of test (e.g to assist in the diagnosis of … or the exclusion of…) Genes tested list Methodology used including confirmation of variants by an orthogonal method if performed Limitations to test including any remaining uncertainty where it exists Result summary • • • • • • • • Inheritance model used for sequencing data analysis if relevant Gene name using HGNC approved gene symbol Zygosity cDNA nomenclature utilising standardised nomenclature (HGVS recommended) Protein nomenclature utilising HGVS recommended nomenclature Genomic coordinates utilising HGVS recommended nomenclature based on an LRG where available and a RefSeqGene record if not available. Reference sequences including genome build or reference sequence version Variant reporting policy for the reporting laboratory that complies with relevant guidelines Interpretive comment • • • Variant classification as class 1 (benign) through to class 5 (pathogenic) Narrative comment indicating the relevance of the identified variants to the reason for the test request If applicable the need for follow up or confirmatory testing should be indicated on the report 6.2 Laboratories should include recommendations for appropriate follow-up in reports. In situations in which further genetic studies may be warranted (e.g. parental testing, segregation analysis, testing of other tissues), these recommendations should be included in the test report. 6.2.1 Examples/resources • • Caris NSCLC example report Foundation One Sample Report for Cancer Related Genes 7. Internal Laboratory Databases 7.1 Laboratories should establish an internal database of genomic findings. 33 This could serve the purpose of identifying common genomic variants specific to a patient population and/or recurrent false-positive calls associated with a particular genomic platform. The curation of an internal laboratory- and platform-specific list of common benign variants can assist with the interpretation process. This could aid with the process of systematic review of variant interpretations. Any handling of data derived from medical testing should comply with the relevant regulatory and legislative requirements. 8. Sharing Genomic Data 8.1 Laboratories are strongly encouraged to submit genotypic data from genomic testing to appropriate clinical databases to facilitate consistency of interpretation across laboratories. Genomic testing would benefit from the availability of clinically vetted, regularly updated databases of annotated variants that ideally would include population frequencies and referenced clinical relevance for each variant. Thus, there is a need for consolidation of the various genotype-phenotype databases available into a commonly available and perhaps centralized clinical- grade resource that is publically accessible. The availability of phenotypic information is essential for the investigation of genotype-phenotype relationships both at the individual patient level and at the global level. Integrating the phenotype information into these clinical- grade resource databases will greatly assist with interpretation of variants identified from genomic analyses. Laboratories should be encouraged to contribute phenotypic information whilst being aware of the issues posed by privacy concerns, data complexity and lack of uniform methods for collection of phenotypic data. 9. Reporting of Somatic Variants The reporting of somatic variants in oncology should follow the recommendations described previously for germ line variants, with the following additional considerations. If the variants are known to predict a likely response to specific therapeutic agents then the relationship between the presence or absence of the variant and the agent must be included on the report. Due to the heterogeneous nature of tumours the percentage of the variant allele that is considered significant to recommend or not recommend use of the agent should be included on the report along with the uncertainty of measurement for the variant. The percentage of tumour cells in a solid tumour sample or blasts in haematological malignancy must also be included in the report as contextual information. 10. Summary Comment The utility of a genomics report can be increased by the preparation of a standardised report that adheres to established guidelines. However, of equal importance is the need to ensure that clinicians, genetic counsellors, or others who read these reports have the necessary training and support to optimise interpretation of genomic test results for patient care. In part, this can be achieved by greater collaboration between requestors of genomic tests and the laboratories performing genomic tests however, formal education of requestors in the interpretation of genomics reports is likely to further increase the utility of genomics reports. Specific training for genomic testing needs to be incorporated into existing professional development programmes for laboratory geneticists, clinical geneticists and other clinical specialities (e.g cardiology, obstetrics, neurology) and primary care medical practitioners that may be likely to request genomic testing. With the advent of direct to consumer testing both in Australia and abroad, genomic education of the general public should also be addressed, and may be feasible by the technological advances made in mobile applications. 34 It is suggested that individual laboratories reporting genomic tests play an active role in developing and implementing a continuing professional development program focussed on genomic testing by establishing structured in-house training programs for their staff and potential referrers who are seeking to extend their professional competency into this arena. Such a programme should focus on the tests offered by the individual reporting laboratory. 10.1 Resources • • • • • • • • • • • • • • • • • NPAAC. Requirements for Medical Testing of Human Nucleic Acids 2012. NPAAC. Requirements for Information Communication 2007. International System for Human Cytogenetic Nomenclature (ISCN). Genetic testing in asymptomatic minors: Recommendations of the European Society of Human Genetics. Eur J Hum Genet. 2009 Jun;17(6):720-1. Free article RCPA Guidelines for reporting molecular genetic tests to medical practitioners 2009 HGVS Nomenclature for the Description of Sequence Variants Riggs et al, Phenotypic Information in Genomic Variant Databases Enhances Clinical Care and Research: The International Standards for Cytogenomic Arrays Consortium Experience. Hum Mut 33:787–796, 2012 Kircher et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014;46:310-5 McArthur et al Nature. Guidelines for investigating causality of sequence variants in human disease. Nature 2014; 508:469-76 Samocha et al. A framework for the interpretation of de novo mutation in human disease Nature Genetics 2014 46, 944-950 Hofman et al Yield of molecular and clinical testing for arrhythmia syndromes: report of 15 years' experience. Circulation. 2013 Oct 1;128(14):1513-21 Thompson et al. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database. Nat Genet 2014; 46:107-15 Opportunities and Challenges Associated with Clinical Diagnostic Genome Sequencing. A Report of the Association for Molecular Pathology. The Journal of Molecular Diagnostics, 2012. Emory university public facing database Opportunities and Challenges Associated with Clinical Diagnostic Genome Sequencing. A Report of the Association for Molecular Pathology. The Journal of Molecular Diagnostics, 2012. de Leeuw et al, Diagnostic interpretation of array data using public databases and internet sources. Hum Mut 33:930–940, 2012. ESHG. Whole Genome Sequencing and Analysis and the Challenges for Health Care Professionals: Recommendations of the European Society of Human Genetics. 2012 (draft) CHAPTER FIVE IT Infrastructure 1. Introduction Genomic technologies introduce complex analytical methods which require substantial bioinformatic and IT infrastructure which are not the usual domain of regulatory and/or accreditation agencies. This chapter will discuss the specific IT infrastructure issues that should be addressed by laboratories considering genomic methods. 1.1 IT process overview 35 Following sequencing, primary analysis (base-calling) usually occurs on the sequencing instrument and is beyond the control of the user. Secondary analysis (alignment and variant-calling) can occur on-or off-instrument. Tertiary analysis usually occurs off-instrument. Most of the sequencing manufacturers provide appropriate computing power and storage on-instrument. Where analyses occur off-instrument, it is necessary for the laboratory to consider the following issues (also see Figure): • • • The level of processing power required to perform timely analyses The need to ensure data integrity during transfer across a network Data management and storage Given the vast potential of genomic methods to generate genome-wide data, laboratories will need to actively consider precisely which data they will store and the retention time of that data. In some cases it may be that institutional IT departments and policies may be able to accommodate data within centralized storage facilities. However, there may be many cases where this is not possible and the problems will need to be addressed locally. 2. Data processing infrastructure and capacity 2.1 The computing hardware and other IT infrastructure should be fit for purpose. Specific requirements will vary according to the platform and style of analysis (i.e. the requirements for small-scale targeted sequencing will be different to those for whole-exome or whole-genome sequencing). The choice of computing hardware specification (i.e. type and number of CPUs or GPUs, amount of RAM, type and amount of storage platform and operating system) will be governed by the chosen software/analytical pipelines (see Bioinformatics chapter). 2.2 Computing hardware should at least meet the minimum specifications of the software. Further consideration should be given to equipment which exceeds the minimum specifications in order the reduce processing, and hence turnaround time. 2.3 The laboratory should show that the choice of hardware and software can be maintained appropriately, including installation, updates and troubleshooting. The choice of operating system will also be largely determined by the specific software and analytical programs being used. At a minimum, a 64-bit operating system should be installed (memory allocation can be severely restricted in some/all 32-bit operating systems). 2.4 The chosen computing hardware should be shown capable of performing the required analyses and/or capable of running the chosen software using training/control datasets (i.e. datasets with characteristics consistent with clinical samples to be analysed. Datasets may be supplied by software providers, or may be obtained externally. 3. Data Transfer 3.1 Wherever possible, data should not be transferred using USB “memory sticks” or external hard drives. 3.2 Consideration should be given to the use of high-speed network connections between the various components of the computing hardware. 36 Genomic methods have the capacity to generate very large data files. During analysis data may need to be transferred to different computing hardware (i.e. from sequencer to analytical computer or from sequencer to storage location). A speed of 1 gigabit/second (i.e. Gigabit Ethernet) is suggested as a minimum data transfer speed. This requirement will affect network cables as well as routers/switches. Infrastructure capable of faster transfers will reduce delays introduced by the transfer of large files. 3.3 Confidentiality of data should be maintained during data transfer. 3.4 Appropriate steps should be taken to ensure that data corruption does not occur during transfer. This is a significant issue, especially as files increase in size. Laboratories should implement a system to show that data transferred between different elements of their computing hardware have not been corrupted during the transfer. Consideration should also be given to similar mechanisms for data transferred to external organisations for analysis. Checksums for individual files or compressed files can be generated using a variety of software packages. 3.4.1 Resources See also resources section in Ethical & Legal Issues. 4. Data management and storage 4.1 The laboratory should determine and justify which data are to be stored. During data generation and analysis, a series of files of varying sizes are created. In Sanger sequencing, the stored data includes unedited chromatograms (“raw” data), edited chromatograms, sequence alignments and summarized results/reports. Equivalent components can be identified within NGS pipelines, although the amount of storage required will be significantly larger. Some genomic data may need to be repeatedly accessed and analysed over a greater period than expected in typical data retention policies (e.g. whole genome or whole exome data). Where possible, the laboratory should determine the feasibility of very long term data retention. The laboratory should develop a formal data management policy which minimizes the possibility of data loss. During analysis, genomic data will be transferred to a number of different computers for analysis and/or storage. 4.2 The laboratory should ensure that data are stored in a manner that prevents loss in the event of hardware failure (i.e. data should have redundant backup). The specific choice of computing hardware for storage purposes will vary between laboratories. The specifications of storage devices will be substantially different from the specifications of processing devices (see above). The important characteristics of storage devices will be quantity, speed and redundancy. It is suggested that “solid state” devices are inappropriate for long-term data storage as their life-span has not been empirically determined. Cloud storage has the potential for reducing the loss of data due to hardware failure, and is readily scalable, but issues of bandwidth for access and confidentiality of identifiable data remain major concerns. 4.2.1 Resources • The potential for cloud computing services in Australia. A Lateral Economics report to Macquarie Telecom. October 2011. 37 • • Financial Considerations for Government use of Cloud Computing. Australian Dept of Finance & Deregulation. Nov 2011. See also section in Ethical & Legal Issues. 4.2.2 Other Resources This section lists some of the documents which address quality issues in genomic sequencing generally. More specific references are provided in subsequent sections of this document. • • • • • • • Gargis et al., Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol 2012 Nov;30(11):1033-6. doi: 10.1038/nbt.2403. PubMed PMID: 23138292 CDC. Next Generation Sequencing: Standardization of Clinical Testing (Nex-StoCT) Working Groups. CDC. Next-generation Sequencing: Standardization of Clinical Testing (Nex-StoCT) Workgroup Principles and Guidelines. Weiss et al. Best Practice Guidelines for the Use of Next Generation Sequencing (NGS) Applications in Genome Diagnostics: A National Collaborative Study of Dutch Genome Diagnostic Laboratories. Hum Mutat 2013 Jun 17. PubMed PMID: 23776008. Schrijver et al. Opportunities and challenges associated with clinical diagnostic genome sequencing: a report of the Association for Molecular Pathology. J Mol Diagn 2012 Nov;14(6):525-40. Next Generation Sequencing (NGS) guidelines for somatic genetic variant detection. New York State Department of Health, 2015 update. Rehm et al. ACMG clinical laboratory standards for next-generation sequencing. Genet Med 2013 Jul 25. doi: 10.1038/gim.2013.92. [Epub ahead of print PubMed PMID: 23887774.] Acknowledgements The RCPA wishes to acknowledge Dr Melody Caramins (Editor-in-Chief, and Reporting chapter writing committee), the immediate past Editor-in-Chief Professor Graeme Suthers, and the Chapter Editors Dr Janice Fletcher (Ethical and legal issues), Associate Professor Bruce Bennetts (Wet Lab), Professor Leslie Burnett (Bioinformatics), Dr Cliff Meldrum (Reporting) and Professor Graham Taylor (IT infrastructure); the funding provided by the Australian Government Department of Health & Aging, and the RCPA Genetics Advisory Committee. Writing committee contributors included: Ethical and legal issues chapter: Associate Professor David Amor, Professor Leslie Burnett, Dr Mark Davis, Mr Mike Ralston, and Associate Professor Meredith Wilson. Wet Lab Chapter: Dr Desiree DuSart, Dr Andrew Fellowes, Professor Nelson Tang, Dr Elizabeth Tegg. Bioinformatics chapter: Dr Douglas Chesher, Dr Warren Kaplan, Dr Karin Kassahn, and Dr Joshua Ho. Infrastructure chapter: Dr Denis Bauer, Mr Ken Doig, Dr Arthur Hsu, and Associate Professor Andrew Lonie. Reporting Chapter: Professor Leslie Burnett, Dr Melody Caramins, and Dr Peter Taylor. 38