SUPPLEMENTARY INFORMATION 1. Materials and Methods 2. Supplementary Figures 3. Supplementary Tables 4. Authors’ Contribution 5. Supplementary References 1. Materials and Methods Patients. A total of 31 PV patients were collected for this study. The collection of blood samples was performed at the University of Utah, School of Medicine, and was approved by the Institutional Review Board. Written consent was obtained from all patients in accordance with the Declaration of Helsinki. The LeukemiaNet criteria for clinicohematologic response were used to assess treatment responses in patients1. Isolation of GNC and T-cells and DNA extraction. Granulocyte (GNC) and mononuclear cell fractions were isolated according to previously published protocol2. Tcells were positively selected from mononuclear cells by CD3+ MicroBead Kit (Miltenyi Biotec Inc, Auburn, CA). Genomic DNA was isolated from granulocytes and T-cells using the Gentra-Puregene Kit (Qiagen, Valencia, CA). 1 Illumina library construction. Illumina libraries were constructed according to the manufacturer’s protocol with modifications as described in HGSC website (https://hgsc.bcm.edu/sites/default/files/documents/Illumina_Barcoded_PairedEnd_Capture_Library_Preparation.pdf). Libraries were prepared using Beckman robotic workstations (Biomek NXp and FXp models. Briefly, 1 ug of genomic DNA in 100ul volume was sheared into fragments of approximately 300-400 base pairs in a Covaris plate with E210 system (Covaris, Inc. Woburn, MA) followed by end-repair, A-tailing and ligation of the Illumina multiplexing PE adaptors. Pre-capture Ligation MediatedPCR (LM-PCR) was performed for 7 cycles of amplification using the 2X SOLiD Library High Fidelity Amplification Mix (a custom product manufactured by Invitrogen). Universal primer IMUX-P1.0 and a pre-capture barcoded primer IBC were used in the PCR amplification. In total, a set of 12 such barcoded primers were used on these samples. Purification was performed with Agencourt AMPure XP beads after enzymatic reactions. Following the final XP beads purification, quantification and size distribution of the pre-capture LM-PCR product was determined using the LabChip GX electrophoresis system (PerkinElmer). Exome capture. Four pre-capture libraries were pooled together (approximately 250 ng/sample, 1 ug per pool) and hybridized in solution to the HGSC VCRome 2.1 design1 (42Mb, NimbleGen) according to the manufacturer’s protocol NimbleGen SeqCap EZ Exome Library SR User’s Guide (Version 2.2) with minor revisions. Human COT1 DNA and full-length Illumina adaptor-specific blocking oligonucleotides were added into the 2 hybridization to block repetitive genomic sequences and the adaptor sequences. Postcapture LM-PCR amplification was performed using the 2X SOLiD Library High Fidelity Amplification Mix with 14 cycles of amplification. After the final AMPure XP bead purification, quantity and size of the capture library was analyzed using the Agilent Bioanalyzer 2100 DNA Chip 7500. The efficiency of the capture was evaluated by performing a qPCR-based quality check on the four standard NimbleGen internal controls. Successful enrichment of the capture libraries was estimated to range from a 6 to 9 of ΔCt value over the non-enriched samples. Aliquots of enriched libraries (10 nM) were submitted for sequencing. Illumina sequencing. Library templates were prepared for sequencing using Illumina’s cBot cluster generation system with TruSeq PE Cluster Generation Kits (Part no. PE-4013001). Briefly, these libraries were denatured with sodium hydroxide and diluted to 3-6 pM in hybridization buffer in order to achieve a load density of ~800K clusters/mm2. Each library pool was loaded in a single lane of a HiSeq flow cell, and each lane was spiked with 1% phiX control library for run quality control. The sample libraries then underwent bridge amplification to form clonal clusters, followed by hybridization with the sequencing primer. Sequencing runs were performed in paired-end mode using the Illumina HiSeq 2000 platform. Using the TruSeq SBS Kits (Part no. FC-401-3001), sequencing-by-synthesis reactions were extended for 101 cycles from each end, with an additional 7 cycles for the index read. Real Time Analysis (RTA) software was used to process the image analysis and base calling. Sequencing runs generated approximately 300-400 million successful reads on each lane of a flow cell, yielding 9-10 Gb per 3 sample. With these sequencing yields, samples achieved an average of 96% of the targeted exome bases covered to a depth of 20X or greater. Exome sequencing data processing and quality control. Exome sequence data processing and analysis were performed using the standard pipelines established at Human Genome Sequencing Center (HGSC) of Baylor College of Medicine3, 4. Read sequences were mapped to the human reference genome (GRCh37) by BWA5. All BAM files were processed to identify duplicates using the Picard and then recalibrated and realigned by GATK6. Quality control modules were used to compare genotypes derived from Affymetrix arrays and sequencing data to ensure concordance. Genotypes from SNP arrays were also used to monitor for low levels of cross-contamination between samples from different individuals. After PCR duplication removal, we obtained an average of 125x (GNC) and132x (T-cells) coverage of the targeted protein-coding regions. 94.9% (GNC) and 95.0% (T-cells) of the targeted bases were covered by at least 20 folds. Mutation calling and annotation. The Atlas-SNP2 algorithm was used to identify somatic single-nucleotide variants in targeted exons7. We also applied Pindel to call small-to-medium size of insertions and deletions8. A minimum of 4 high-quality supporting reads and a minimum mutant allele fraction of 0.05 was required for mutation calling. Somatic mutations and germline variants were annotated using information from publicly available databases, including dbSNP build 1359, ANNOVAR10 and COSMIC v5711. 4 Mutation validation. Mutation validation was done using the Ion Torrent Personal Genome Machine (PGM, Life Technologies Corporation). Only somatic non-silent mutations were selected for validation. To make the Ion libraries, amplicons from the GNC and T-cells were barcoded, pooled, sheared by enzymatic digestion, adaptor ligated, size selected and amplified according to manufacture’s instructions. The Ion Torrent sequencing data was analyzed using Torrent Suite Software v3.0. DNA. The average read depth obtained per base was 2226x and 2034x for GNC and T-cells pools, respectively. A minimum of 50 high-quality supporting reads and a minimum mutant allele fraction of 0.05 was required to define a validated mutation. Designing of Ion AmpliSeq arrays. The serial granulocytes samples collected from 7 patients were sequenced on the Ion Torrent PGM platform using Ampliseq custom array. The Ion Ampliseq custom array was designed for 42 most frequently mutated genes in myeloproliferative disorders. The coding exons of 42 selected genes were extracted using UCSC table browser (hg19) and submitted to Ion AmpliSeq Designer using pipeline version 1.2 using settings for standard DNA and an amplicon range 125-225 base pairs. The resulting custom design consists a total of 1202 amplicons. The average amplicon size was 200 base pairs. Ion Torrent library construction. Ion Ampliseq library kit 2.0 (Cat#4480441, Life Technologies) consisting of Ampliseq PCR and library preparation reagents was used to prepare template DNA for sequencing. Ampliseq reactions were performed separately for 5 Pool 1 and pool 2 for each sample. Each Ampliseq reaction was set up using 10 ng of DNA as input. Thermo cycling conditions included, initial denaturation for 2 min. at 99°C followed by 16 annealing and extension cycles of 15 s at 99°C and 4 min. at 60°C. Libraries were prepared using Beckman robotic workstations. Following the Ampliseq reaction, 2ul of FuPa reagent was added to remove PCR adaptor regions and repair fragment ends. Ion Xpress™ Barcode Adapters were then ligated to each pool. The Postligation products were purified using Agencourt AMPure XP beads. Thermocycling conditions were initial denaturation for 2 min. at 98°C followed by 7 annealing and extension cycles of 15 s at 98°C and 1 Min. at 60°C. Agencourt XP® beads were used to purify DNA after each reaction step. PCR products were purified using the above SPRI beads followed by quantification and size distribution using the LabChip GX electrophoresis system (PerkinElmer). Four to eight samples (8-16 libraries) were sequenced per run on Ion Torrent PGM instrument. The library templates were prepared for sequencing using the Life Technologies Ion OneTouch v2 DL protocols and reagents. Briefly, library fragments were clonally amplified onto Ion Sphere Particles (ISPs) through emulsion PCR and then enriched for template-positive ISPs. More specifically, PGM emulsion PCR reactions utilized the Ion OneTouch 200 Template Kit v2 DL (Life Technologies, Part no. 4480285), and as specified in the accompanying protocol, emulsions and amplification were generated using the Ion OneTouch System (Life Technologies, Part no. 4467889). Following recovery, enrichment was completed by selectively binding the ISPs containing amplified library fragments to streptavidin coated magnetic beads, removing empty ISPs through 6 washing steps, and denaturing the library strands to allow for collection of the templatepositive ISPs. For all reactions, these steps were accomplished using the Life Technologies ES module of the Ion OneTouch System, and template-positive ISPs were quantified using the Guava EasyCyte 5 (Millipore Technologies), obtaining >90% enrichment efficiency for all reactions. Approximately 20 million template-positive ISPs per run were deposited onto the Ion 318C chips (Life Technologies, Part no. 4469497) by a series of centrifugation steps that incorporated alternating the chip directionality. Sequencing was performed with the Ion PGM 200 Sequencing Kit (Life Technologies, 4474004) using the 440 flow (“200bp”) run format. Ion PGM sequencing data processing and mutation calling. The PGM sequencing data was processed using Ion Torrent Suite Software v3.0. Reads were aligned to the genome using TMAP against human reference genome build 37 (NCBI) with default parameters. Mutations were called using BAM files from the tumor and matched normal samples. Atlas-SNP7 was run for SNP calling. The variants were further filtered to remove those supported by less than 5 sequencing reads or presented in less than 8% of aligned reads. For indels, the variant allele must be supported by at least 10 sequencing reads. In addition, it is requested that at least one variant had to be Q30 or better and had to lie in the central portion of the read. Besides, reads harboring the variant must have been observed in both forward and reverse orientations. DNA copy number analysis. DNA copy number analysis was done for all 90 patients. Among them, 50 patients were analyzed by Affymetrix SNP 6.0 array and 40 patients 7 were analyzed by Illumina610 SNP array. The fraction of the cell population harboring the 9p aUPD event was quantified by SNP genotyping signal intensities as described in our earlier study12. 8 2. Supplementary Figures Supplementary Figure 1 The average read coverage for targeted regions achieved by whole-exome sequencing for the granulocytes and T cells of 31 PV patients. 9 Supplementary Figure 2 The base 20+ coverage achieved by whole-exome sequencing for the granulocytes and T cells of 31 PV patients. The base 20+ coverage: the percentage of targeted bases which were covered by at least 20 high-quality sequencing reads. 10 Supplementary Figure 3 The focal DNA copy number alterations during PV progression. BAF, B allele fraction. Mb, mega bases. 2011 and 2013 indicates the time when the samples were collected. 11 Supplementary Figure 4 The mutational spectral of 31 PV patients. Only those validated non-silent somatic mutations were counted. INDEL, small insertions and deletions. 12 0.2 0.4 0.6 0.8 1 VarRatio-T 1 PV17 0.8 0.6 0.4 JAK2 0.2 NF1 0 0 0.2 0.4 0.6 0.8 1 VarRatio-T 1 PV20 0.8 JAK2 0.6 0.4 ASXL1 0.2 0 0 0.2 0.4 0.6 0.8 1 VarRatio-T 1 PV23 0.8 0.6 JAK2 NF1 0.4 ASXL1 0.2 0 0 0.2 0.4 0.6 0.8 1 VarRatio-T 1 0.8 0.6 0.4 0.2 0 JAK2 VarRatio-G VarRatio-G VarRatio-G TET2 0 VarRatio-G VarRatio-G PV10 PV3 0.8 JAK2 0.6 0.4 NF1 0.2 0 0 0.2 0.4 0.6 0.8 1 VarRatio-T 1 PV7 0.8 0.6 NF1 0.4 JAK2 0.2 DNMT3A 0 0 0.2 0.4 0.6 0.8 1 VarRatio-T VarRatio-G VarRatio-G Supplementary Figure 5 1 0.8 0.6 JAK2 0.4 0.2 0 PV30 IDH2 0 0.2 0.4 0.6 0.8 1 VarRatio-T The potential pluripotent stem-cell-level mutations identified in 7 PV patients. The variant allele fraction (VarRatio) for granulocytes (G) and T cells (T) was shown. The sample purity was evaluated using the VarRatio of JAK2V617F mutation. 13 3. Supplementary Tables Supplementary Table 1. The complete list of somatic and germline variants identified in 31 PV patients (see separate xls file). 14 4. Authors’ contribution L.W. conducted the major bioinformatics analyses of the sequencing and SNP array data, wrote and revised the manuscript. D.A.W. and J.T.P. conceived the study, supervised the implementation of the research plan, reviewed and revised the manuscript. S.I.S. prepared genomic DNA samples. J.T.P. and K.H. collected, interpreted clinical data and obtained necessary regulatory documents and Informed Consents from studied subjects. J.D. contributed the exome sequencing analysis. K.W. contributed the AmpliSeq array design and data analysis. D.M.M. and H.D. managed the production pipeline. R.A.G. reviewed and revised the manuscript. 5. References 1. Barosi G, Birgegard G, Finazzi G, Griesshammer M, Harrison C, Hasselbalch HC et al. Response criteria for essential thrombocythemia and polycythemia vera: result of a European LeukemiaNet consensus conference. Blood 2009; 113(20): 4829-33. 2. Prchal JT, Throckmorton DW, Carroll AJ, 3rd, Fuson EW, Gams RA, Prchal JF. A common progenitor for human myeloid and lymphoid cells. Nature 1978; 274(5671): 590-1. 3. Biankin AV, Waddell N, Kassahn KS, Gingras MC, Muthuswamy LB, Johns AL et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature 2012; 491(7424): 399-405. 4. TCGA. Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012; 487(7407): 330-7. 5. Li H, Durbin R. Fast and accurate short read alignment with BurrowsWheeler transform. Bioinformatics 2009; 25(14): 1754-60. 6. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics 2011; 43(5): 491-8. 15 7. Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, Ostrowski EA et al. A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome research 2010; 20(2): 273-80. 8. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 2009; 25(21): 286571. 9. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM et al. dbSNP: the NCBI database of genetic variation. Nucleic acids research 2001; 29(1): 308-11. 10. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic acids research 2010; 38(16): e164. 11. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic acids research 2011; 39(Database issue): D945-50. 12. Wang K, Swierczek S, Hickman K, Hakonarson H, Prchal JT. Convergent mechanisms of somatic mutations in polycythemia vera. Discovery medicine 2011; 12(62): 25-32. 16