SUPPLEMENTARY INFORMATION 1. Materials and Methods 2. Supplementary Figures 3. Supplementary Tables 4. Authors’ Contribution 5. Supplementary References 1. Materials and Methods Patients. A total of 31 PV patients were collected for the discovery study and the results obtained were validated in additional two cohorts totaling 59 PV patients. Clinical data were available for 69 patients across all three cohorts. The collection of blood samples was performed at the University of Utah, School of Medicine, and was approved by the Institutional Review Board. Written consent was obtained from all patients in accordance with the Declaration of Helsinki. The LeukemiaNet criteria for clinicohematologic response were used to assess treatment responses in patients1. The allelic fraction of JAK2V617F in GNC and T-cell was determined by WXS for patients in the discovery study and by deep Ion AmpliSeq sequencing for patients in the validation study. The WXS data was validated by deep Ion AmpliSeq sequencing. The 1 fraction of 9p aUPD in all patients was analyzed by high-resolution SNP arrays. The fraction of 9p aUPD in GNC was quantitatively measured as described in our earlier study3. Isolation of GNC and T-cells and DNA extraction. Granulocyte (GNC) and mononuclear cell fractions were isolated according to previously published protocol4. Tcells were positively selected from mononuclear cells by CD3+ MicroBead Kit (Miltenyi Biotec Inc, Auburn, CA). Genomic DNA was isolated from granulocytes and T-cells using the Gentra-Puregene Kit (Qiagen, Valencia, CA). Illumina library construction. Illumina libraries were constructed according to the manufacturer’s protocol with modifications as described in HGSC website (https://hgsc.bcm.edu/sites/default/files/documents/Illumina_Barcoded_PairedEnd_Capture_Library_Preparation.pdf). Libraries were prepared using Beckman robotic workstations (Biomek NXp and FXp models. Briefly, 1 ug of genomic DNA in 100ul volume was sheared into fragments of approximately 300-400 base pairs in a Covaris plate with E210 system (Covaris, Inc. Woburn, MA) followed by end-repair, A-tailing and ligation of the Illumina multiplexing PE adaptors. Pre-capture Ligation MediatedPCR (LM-PCR) was performed for 7 cycles of amplification using the 2X SOLiD Library High Fidelity Amplification Mix (a custom product manufactured by Invitrogen). Universal primer IMUX-P1.0 and a pre-capture barcoded primer IBC were used in the PCR amplification. In total, a set of 12 such barcoded primers were used on these samples. Purification was performed with Agencourt AMPure XP beads after enzymatic 2 reactions. Following the final XP beads purification, quantification and size distribution of the pre-capture LM-PCR product was determined using the LabChip GX electrophoresis system (PerkinElmer). Exome capture. Four pre-capture libraries were pooled together (approximately 250 ng/sample, 1 ug per pool) and hybridized in solution to the HGSC VCRome 2.1 design1 (42Mb, NimbleGen) according to the manufacturer’s protocol NimbleGen SeqCap EZ Exome Library SR User’s Guide (Version 2.2) with minor revisions. Human COT1 DNA and full-length Illumina adaptor-specific blocking oligonucleotides were added into the hybridization to block repetitive genomic sequences and the adaptor sequences. Postcapture LM-PCR amplification was performed using the 2X SOLiD Library High Fidelity Amplification Mix with 14 cycles of amplification. After the final AMPure XP bead purification, quantity and size of the capture library was analyzed using the Agilent Bioanalyzer 2100 DNA Chip 7500. The efficiency of the capture was evaluated by performing a qPCR-based quality check on the four standard NimbleGen internal controls. Successful enrichment of the capture libraries was estimated to range from a 6 to 9 of ΔCt value over the non-enriched samples. Aliquots of enriched libraries (10 nM) were submitted for sequencing. Illumina sequencing. Library templates were prepared for sequencing using Illumina’s cBot cluster generation system with TruSeq PE Cluster Generation Kits (Part no. PE-4013001). Briefly, these libraries were denatured with sodium hydroxide and diluted to 3-6 pM in hybridization buffer in order to achieve a load density of ~800K clusters/mm2. 3 Each library pool was loaded in a single lane of a HiSeq flow cell, and each lane was spiked with 1% phiX control library for run quality control. The sample libraries then underwent bridge amplification to form clonal clusters, followed by hybridization with the sequencing primer. Sequencing runs were performed in paired-end mode using the Illumina HiSeq 2000 platform. Using the TruSeq SBS Kits (Part no. FC-401-3001), sequencing-by-synthesis reactions were extended for 101 cycles from each end, with an additional 7 cycles for the index read. Real Time Analysis (RTA) software was used to process the image analysis and base calling. Sequencing runs generated approximately 300-400 million successful reads on each lane of a flow cell, yielding 9-10 Gb per sample. With these sequencing yields, samples achieved an average of 96% of the targeted exome bases covered to a depth of 20X or greater. Exome sequencing data processing and quality control. Exome sequence data processing and analysis were performed using the standard pipelines established at Human Genome Sequencing Center (HGSC) of Baylor College of Medicine5, 6. Read sequences were mapped to the human reference genome (GRCh37) by BWA7. All BAM files were processed to identify duplicates using the Picard and then recalibrated and realigned by GATK8. Quality control modules were used to compare genotypes derived from Affymetrix arrays and sequencing data to ensure concordance. Genotypes from SNP arrays were also used to monitor for low levels of cross-contamination between samples from different individuals. After PCR duplication removal, we obtained an average of 125x (GNC) and132x (T-cells) coverage of the targeted protein-coding regions. 94.9% (GNC) and 95.0% (T-cells) of the targeted bases were covered by at least 20 folds. 4 Mutation calling and annotation. The Atlas-SNP2 algorithm was used to identify somatic single-nucleotide variants in targeted exons9. We also applied Pindel to call small-to-medium size of insertions and deletions10. A minimum of 4 high-quality supporting reads and a minimum mutant allele fraction of 0.05 was required for mutation calling. Somatic mutations and germline variants were annotated using information from publicly available databases, including dbSNP build 13511, ANNOVAR12 and COSMIC v5713. Mutation validation. Mutation validation was done using the Ion Torrent Personal Genome Machine (PGM, Life Technologies Corporation). Only somatic non-silent mutations were selected for validation. To make the Ion libraries, amplicons from the GNC and T-cells were barcoded, pooled, sheared by enzymatic digestion, adaptor ligated, size selected and amplified according to manufacture’s instructions. The Ion Torrent sequencing data was analyzed using Torrent Suite Software v3.0. DNA. The average read depth obtained per base was 2226x and 2034x for GNC and T-cells pools, respectively. A minimum of 50 high-quality supporting reads and a minimum mutant allele fraction of 0.05 was required to define a validated mutation. Designing of Ion AmpliSeq arrays. JAK2 for 51 out of 59 PV patients in validation study was sequenced on the Ion Torrent PGM platform. 45 patients were done using the Ion AmpliSeq Cancer Hotspot Panel v2 kit (Life Technologies), 17 patients were done using the Ion Ampliseq custom array and 11 patients were done by both arrays. The cancer hotspots array includes 2,800 cancer ‘hot-spot’ codons from 50 frequently mutated 5 cancer genes. The amplicons size varied from 80 to 140 base pairs. The Ion Ampliseq custom array was designed for 42 most frequently mutated genes in myeloproliferative disorders. The coding exons of 42 selected genes were extracted using UCSC table browser (hg19) and submitted to Ion AmpliSeq Designer using pipeline version 1.2 using settings for standard DNA and an amplicon range 125-225 base pairs. The resulting custom design consists a total of 1202 amplicons. The average amplicon size was 200 base pairs. Ion Torrent library construction. Ion Ampliseq library kit 2.0 (Cat#4480441, Life Technologies) consisting of Ampliseq PCR and library preparation reagents was used to prepare template DNA for sequencing. Ampliseq reactions were performed separately for Pool 1 and pool 2 for each sample. Each Ampliseq reaction was set up using 10 ng of DNA as input. Thermo cycling conditions included, initial denaturation for 2 min. at 99°C followed by 16 annealing and extension cycles of 15 s at 99°C and 4 min. at 60°C. Libraries were prepared using Beckman robotic workstations. Following the Ampliseq reaction, 2ul of FuPa reagent was added to remove PCR adaptor regions and repair fragment ends. Ion Xpress™ Barcode Adapters were then ligated to each pool. The Postligation products were purified using Agencourt AMPure XP beads. Thermocycling conditions were initial denaturation for 2 min. at 98°C followed by 7 annealing and extension cycles of 15 s at 98°C and 1 Min. at 60°C. Agencourt XP® beads were used to purify DNA after each reaction step. PCR products were purified using the above SPRI beads followed by quantification and size distribution using the LabChip GX electrophoresis system (PerkinElmer). Four to eight samples (8-16 libraries) were 6 sequenced per run on Ion Torrent PGM instrument. The library templates were prepared for sequencing using the Life Technologies Ion OneTouch v2 DL protocols and reagents. Briefly, library fragments were clonally amplified onto Ion Sphere Particles (ISPs) through emulsion PCR and then enriched for template-positive ISPs. More specifically, PGM emulsion PCR reactions utilized the Ion OneTouch 200 Template Kit v2 DL (Life Technologies, Part no. 4480285), and as specified in the accompanying protocol, emulsions and amplification were generated using the Ion OneTouch System (Life Technologies, Part no. 4467889). Following recovery, enrichment was completed by selectively binding the ISPs containing amplified library fragments to streptavidin coated magnetic beads, removing empty ISPs through washing steps, and denaturing the library strands to allow for collection of the templatepositive ISPs. For all reactions, these steps were accomplished using the Life Technologies ES module of the Ion OneTouch System, and template-positive ISPs were quantified using the Guava EasyCyte 5 (Millipore Technologies), obtaining >90% enrichment efficiency for all reactions. Approximately 20 million template-positive ISPs per run were deposited onto the Ion 318C chips (Life Technologies, Part no. 4469497) by a series of centrifugation steps that incorporated alternating the chip directionality. Sequencing was performed with the Ion PGM 200 Sequencing Kit (Life Technologies, 4474004) using the 440 flow (“200bp”) run format. Ion PGM sequencing data processing and mutation calling. The PGM sequencing data was processed using Ion Torrent Suite Software v3.0. Reads were aligned to the 7 genome using TMAP against human reference genome build 37 (NCBI) with default parameters. Mutations were called using BAM files from the tumor and matched normal samples. Atlas-SNP51 was run for SNP calling. The variants were further filtered to remove those supported by less than 5 sequencing reads or presented in less than 8% of aligned reads. For indels, the variant allele must be supported by at least 10 sequencing reads. In addition, it is requested that at least one variant had to be Q30 or better and had to lie in the central portion of the read. Besides, reads harboring the variant must have been observed in both forward and reverse orientations. DNA copy number analysis. DNA copy number analysis was done for all 90 patients. Among them, 50 patients were analyzed by Affymetrix SNP 6.0 array and 40 patients were analyzed by Illumina610 SNP array. The fraction of the cell population harboring the 9p aUPD event was quantified by SNP genotyping signal intensities as described in our earlier study3. 8 2. Supplementary Figures Supplementary Figure 1 Correlation of the mutational pattern with the prevalence of transformation. I II III Subgroups The correlation analysis was done for pooled patients from all 3 studies whose clinical data are available (n=69). AML, Acute myeloid leukemia; MF, myelofibrosis transformation. P value for the Chi-square tests is shown. 9 Supplementary Figure 2 1 B Allele Fraction a 0.8 0.6 0.4 0.2 20 00 0 40 00 00 0 60 00 00 0 80 00 00 10 00 00 0 0 12 00 00 0 0 14 00 00 0 0 16 00 00 0 0 18 00 00 0 0 20 00 00 0 0 22 00 00 0 0 24 00 00 0 0 26 00 00 0 0 28 00 00 0 0 30 00 00 0 0 32 00 00 0 0 34 00 00 0 0 36 00 00 0 0 38 00 00 0 00 00 0 b V617F JAK2 Analysis of the relationship of 9p aUPD and JAK2V617F in a PV patient. (a), the aUPD event observed at chromosome 9p and the detailed view of JAK2 region (lower panel, colored in red). The distortion of SNP allelic fraction showed complete aUPD across JAK2 gene. b, IGV view of the JAK2V617F, the mutated base T was colored in red. The allelic fraction of V617F is 0.24. Our data indicate that the majority of PV clone was composed of 9p aUPD and only a small subclone also carried JAK2V617F mutation. 10 Supplementary Figure 3 The JAK2 haplotype analysis across 3 subgroups. The 46/1 risk haplotype10-12 analysis was performed using SNP array data from the discovery study to determine if there is any association between this risk haplotype and the progression of GNC from subgroup I to subgroup II. No significant difference in the frequency of the risk haplotype between the subgroup-I and the subgroup-II patient was observed (P > 0.05). 11 Supplementary Figure 4 The distribution of 9p aUPD in a representative PV case. The B allele fraction (B Allele Freq) derived from SNP array is plotted across chromosome 9. The position on x-axis was sorted ascendingly from p arm to q arm. Mb, megabase; Probes covering the JAK2 locus were colored in red. 12 Supplementary Figure 5 The candidate genes identified within the aUPD locus on chromosome 9p. Only those variants that lost the wild-type alleles were counted and those genes recurrent in at least 3 PV patients were displayed. The upper panel indicates the total number of patients exhibited loss of heterozygosity of each gene. The bottom panel indicates the total number of somatic and germline events detected in each gene. 13 3. Supplementary Tables Supplementary Table 1. Functional annotation of genes identified within the aUPD locus on chromosome 9p. Gene # Patient # nonsilent FREM1 11 12 FRAS1-related extracellular matrix protein DOCK8 11 9 dedicator of cytokinesis KANK1 10 12 Kank proteins KDM4C 9 11 JmjC domaincontaining histone demethylation protein CCDC171 9 8 CNTLN 9 7 uncharacterize d protein centlein, centrosomal protein ADAMTSL1 8 7 Protein annotation a disintegrin and metalloprotein ase with thrombospondi n motif Protein Function plays a role in epidermal differentiation and is required for epidermal adhesion during embryonic development Guanine nucleotide exchange factors interact with Rho GTPases and are components of intracellular signaling networks. functions in cytoskeleton formation by regulating actin polymerization; a candidate tumor suppressor for renal cell carcinoma. specifically demethylates 'Lys-9' and 'Lys-36' residues of histone H3, thereby playing a central role in histone code. Appears to associated with the mother centriole during G1 phase and with daughter centrioles towards G1/S phase may have important functions in the extracellular matrix 14 Cell divis ion Epigen etic regula tion Tumo r suppr ession Transcr iption regulati on Y Y Y Y Y Y HAUS6 8 5 a subunit of the augmin complex PTPRD 8 2 protein tyrosine phosphatase FOXD4 7 16 Forkheadrelated transcription factor FOCAD 7 8 focadhesin C9orf66 6 7 KIF24 6 7 uncharacterize d protein Kinesins KCNV2 6 1 Voltage-gated potassium channel subunit SMARCA2 6 1 a member of the SWI/SNF family of proteins FAM205A 5 11 C9orf72 5 5 uncharacterize d protein uncharacterize d protein plays a role in microtubule attachment to the kinetochore and central spindle formation. signaling molecules that regulate a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation play critical roles in the regulation of multiple processes including metabolism, cell proliferation and gene expression during ontogenesis. Potential tumor suppressor in gliomas - Y microtubuledependent ATPases that function as molecular motors. They play important roles in intracellular vesicle transport and cell division Potassium channel subunit. Modulates channel activity by shifting the threshold and the half-maximal activation to more negative values is part of the large ATP-dependent chromatin remodeling complex SNF/SWI, which is required for transcriptional activation of genes normally repressed by chromatin. - Y - 15 Y Y Y Y Y DMRT2 5 5 doublesex and mab-3 related transcription factor Pumilio domaincontaining protein AT-rich interactive domaincontaining protein KIAA0020 5 5 ARID3C 5 3 INSL4 5 1 insulin-like 4 protein TEK 5 1 TEK tyrosine kinase, endothelial GLDC 5 0 glycine dehydrogenase MOB3B 5 0 kinase activator 3b SLC1A1 5 0 FAM154A 4 6 Sodiumdependent glutamate/aspa rtate transporter uncharacterize d protein one of the candidates for sex-determining gene(s) on chr 9 Y - have roles in embryonic patterning, cell lineage gene regulation, cell cycle control, transcriptional regulation and possibly in chromatin structure modification. May play an important role in trophoblast development and in the regulation of bone formation regulates angiogenesis, endothelial cell survival, proliferation, migration, adhesion and cell spreading, reorganization of the actin cytoskeleton, but also maintenance of vascular quiescence. binds to glycine and enables the methylamine group from glycine to be transferred to the T protein. shares similarity with the yeast Mob1 protein, which binds Mps1p, a protein kinase essential for spindle pole body duplication and mitotic checkpoint regulation. play an essential role in transporting glutamate across plasma membranes. - 16 Y Y Y Y DDX58 4 4 DEAD box proteins GLIS3 4 4 Zinc finger protein IFT74 4 4 PTPLAD2 4 4 UBAP2 4 4 IL33 4 3 Coiled-coil domaincontaining protein Proteintyrosine phosphataselike A domaincontaining protein ubiquitin associated protein interleukin 33 TTC39B 4 3 TAF1L 4 2 DENND4C 4 1 TOPORS 4 1 tetratricopepti de repeat protein transcription initiation factor TFIID subunit DENN/MADD domain containing 4C Tumor suppressor p53-binding protein putative RNA helicases, may play important roles in granulocyte production and differentiation, bacterial phagocytosis and in the regulation of cell migration functions as both a repressor and activator of transcription - Y Responsible for the dehydration step in very long-chain fatty acids (VLCFAs) synthesis Cytokine that binds to and signals through IL1RL1/ST2, Induces T-helper type 2-associated cytokines May act as a functional substitute for TAF1/TAFII250 during male meiosis, when sex chromosomes are transcriptionally silenced functions as an ubiquitin-protein E3 ligase, Probable tumor suppressor involved in cell growth, cell proliferation and apoptosis that regulates p53/TP53 stability through ubiquitin-dependent degradation. May regulate chromatin modification through 17 Y Y sumoylation of several chromatin modificationassociated proteins. KIAA2026 4 0 uncharacterize d protein interferon epsilon methylthioaden osine phosphorylase IFNE 3 4 MTAP 3 3 ANKRD18B 3 2 BNC2 3 2 DMRT3 3 1 NOL6 3 1 AQP3 3 0 water channel protein aquaporin 3 DMRTA1 3 0 IFNB1 3 0 doublesex- and mab-3-related transcription factor Fibroblast interferon KIAA1432 3 0 ankyrin repeat domaincontaining protein zinc finger protein basonuclin-2 doublesex and mab-3 related transcription factor Nucleolar RNAassociated protein Connexin-43interacting protein plays a major role in polyamine metabolism and is important for the salvage of both adenine and methionine. - Probable transcription factor specific for skin keratinocytes. May play a role in the differentiation of spermatozoa and oocytes May regulate transcription during sexual development associated with ribosome biogenesis through an interaction with prerRNA primary transcripts. Involved in skin hydration, wound healing, and tumorigenesis. May be involved in sexual development Has antiviral, antibacterial and anticancer activities Required for phosphorylation and localization of GJA1 18 Y Y SLC24A2 3 0 a member of the calcium/cation antiporter superfamily of transport proteins mediate the extrusion of one Ca2+ ion and one K+ ion in exchange for four Na+ ions. Y, related; -, the function of this protein has not been determined. 19 4. Authors’ contribution L.W. conducted the major bioinformatics analyses of the sequencing and SNP array data, wrote and revised the manuscript. D.A.W. and J.T.P. conceived the study, supervised the implementation of the research plan, reviewed and revised the manuscript. S.I.S. prepared genomic DNA samples. J.T.P. and K.H. collected, interpreted clinical data and obtained necessary regulatory documents and Informed Consents from studied subjects. K.W. contributed the AmpliSeq array design and analysis pipeline. D.M.M. managed the sequencing and mutation validation pipeline. K.W provided further analyses of highresolution copy-number arrays and assisted with interpretation of data. J.D. contributed the pipeline of WXS. J.G.R. managed the pipeline for sequencing data mapping, realignment and recalibration. D.M.M. managed the production pipeline. R.A.G. reviewed and revised the manuscript. 5. References 1. Barosi G, Birgegard G, Finazzi G, Griesshammer M, Harrison C, Hasselbalch HC et al. Response criteria for essential thrombocythemia and polycythemia vera: result of a European LeukemiaNet consensus conference. Blood 2009; 113(20): 4829-33. 2. Nussenzveig RH, Swierczek SI, Jelinek J, Gaikwad A, Liu E, Verstovsek S et al. Polycythemia vera is not initiated by JAK2V617F mutation. Experimental hematology 2007; 35(1): 32-8. 3. Wang K, Swierczek S, Hickman K, Hakonarson H, Prchal JT. Convergent mechanisms of somatic mutations in polycythemia vera. Discovery medicine 2011; 12(62): 25-32. 4. Prchal JT, Throckmorton DW, Carroll AJ, 3rd, Fuson EW, Gams RA, Prchal JF. A common progenitor for human myeloid and lymphoid cells. Nature 1978; 274(5671): 590-1. 20 5. Biankin AV, Waddell N, Kassahn KS, Gingras MC, Muthuswamy LB, Johns AL et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature 2012; 491(7424): 399-405. 6. TCGA. Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012; 487(7407): 330-7. 7. Li H, Durbin R. Fast and accurate short read alignment with BurrowsWheeler transform. Bioinformatics 2009; 25(14): 1754-60. 8. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics 2011; 43(5): 491-8. 9. Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, Ostrowski EA et al. A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome research 2010; 20(2): 273-80. 10. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 2009; 25(21): 286571. 11. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM et al. dbSNP: the NCBI database of genetic variation. Nucleic acids research 2001; 29(1): 308-11. 12. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic acids research 2010; 38(16): e164. 13. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic acids research 2011; 39(Database issue): D945-50. 14. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome research 2007; 17(11): 1665-74. 15. Rasmussen M, Sundstrom M, Goransson Kultima H, Botling J, Micke P, Birgisson H et al. Allele-specific copy number analysis of tumor samples with aneuploidy and tumor heterogeneity. Genome biology 2011; 12(10): R108. 21