Hallwirth et al. Coherence analysis of vector integration patterns For Molecular Therapy SUPPLEMENTARY METHODS Transduction of CD34+ cells. Independent batches of MGMT-encoding MFG-based γretroviral vectors were collected from the supernatant of PG13 producer cells. The vectors were identical, with the exception of an MGMT-P140K mutation in the construct used for transduction under London conditions. Paris transduction conditions entailed thawing 1×107 cells on Day 1 and pre-stimulating them in X-VIVO 10 medium (Lonza, Australia) + 4% FCS + cytokines (300 ng/ml Flt3-L, 100 ng/ml TPO [R&D Systems, MN, USA], 300 ng/ml SCF [Amgen, CA, USA], 60 ng/ml IL-3 [Stem Cell Technologies, Canada]) at 0.5×106 cells/ml in a total volume of 20 ml in one 85 cm2 culture bag for 24 hours 37°C, 5% CO2. Day 2: Cells (9.2×106 total) were recovered from the culture bag, resuspended in 20 ml vector supernatant with cytokines (as above + 2 µg/ml protamine) at 0.46×106/ml in a Retronectin (TaKaRa, Japan)-coated 85 cm2 culture bag and incubated for 24 hours. Day 3: Cells (10×106 total) were recovered from the Retronectin-coated bag and resuspended in 20 ml fresh vector supernatant + cytokines + protamine (as above), reseeded into the same Retronectin-coated bag at 0.5×106/ml and incubated for 24 hours. Day 4: Cells (12.8×106 total) were recovered from the Retronectin-coated bag; 1×107 cells from this suspension were resuspended in 20 ml fresh vector supernatant + cytokines + protamine (as above), reseeded into the same Retronectin-coated bag at 0.5×106/ ml and incubated for 24 hours. Day 5: Cells (15.5×106 total) were recovered from the Retronectin-coated bag and washed in 4% human serum albumin (Albumex 4, CSL, Australia) as would be done for infusion into a patient. Transduction parameters were analyzed by flow cytometry (Table 1). London transduction conditions differed from the Paris conditions in the following respects: Cells were cultured in serum-free X-VIVO 10 medium supplemented with 1% human serum albumin and 20 ng/ ml IL-3 instead of 60 ng/ml. IL3, TPO and Flt3-L were sourced from Cellgenix, Germany. Cells 1 Hallwirth et al. Coherence analysis of vector integration patterns For Molecular Therapy were pre-stimulated for 40 hours, followed by two 24-hour transductions and one 6-hour transduction. Junction fragment library construction. Genomic DNA from transduced cells was extracted using a Puregene Blood and Cell Culture DNA Kit (Qiagen, Australia), according to the manufacturer’s protocol for cultured cells. DNA, eluted in DNA Hydration Solution, was stored at -20°C until use. An LM-PCR method16 was employed to selectively amplify junction fragments comprising LTR-derived proviral DNA and adjoining host DNA sequences. The method was adapted to improve linker ligation efficiency and to accommodate fragment library sequencing on the Illumina Genome Analyzer IIx (GAIIx) platform. Transduced gDNA was digested with Tsp509I (New England Biolabs [NEB], Genesearch, Australia; recognition sequence 5’-AATT-3’, leaving four-base 5’ overhangs). Proteins were subsequently removed by organic extraction and the digested DNA was precipitated in the presence of glycogen with sodium acetate and ethanol. The overhangs were partially filled using Klenow Fragment (3’→ 5’ exo-) and dATP (NEB) in NEBuffer 2. Adapters compatible with the partially filled overhangs as well as overhangs that had not been successfully filled were made by annealing oligos (Sigma-Aldrich, Australia) 5’GTAATACGACTCACTATAGGGCACGCGTGGTCGACGGCCCGGG-CTGC and Phos]TTGCAGCCCG[AmC7] or [5’-Phos]AATTGCAGCCCG[AmC7], respectively, [5’at final concentrations of 40 µM each in 10 mM Tris pH 8.0, 0.1 mM EDTA. Annealing was performed by incubation in a Mastercycler Gradient PCR machine (Eppendorf, Australia) for 2 min at 92°C, followed by a temperature decrease in increments of 0.1°C every 4 sec to 82°C, every 5 sec to 72°C, every 8 sec to 62°C, every 10 sec to 52°C, every 12 sec to 42°C and every 15 sec to 12°C. Linkers were ligated to digested DNA fragments at a 10-fold molar excess of linker over cut ends using T4 DNA Ligase (NEB) and supplementation with ATP 2 Hallwirth et al. Coherence analysis of vector integration patterns For Molecular Therapy (NEB) to 1 mM. The number of cut ends was estimated from the median fragment length of ~275 bp and the concentration of Tsp509I-digested DNA. After linker ligation, a second RE digestion was performed using BpmI (NEB) to cleave the vector-3’-LTR-derived fragments arising from Tsp509I cleavage, thereby preventing subsequent amplification of an “internal fragment”. Proteins were removed by organic extraction and DNA was precipitated as above and reconstituted in water. LM-PCR amplifications were carried out with ~500 ng adapter-ligated gDNA fragments as template in 50-µl reactions using HotStarTaq Plus DNA Polymerase (Qiagen), at a final MgCl2 concentration of 2 mM. The amplification utilized the linker-specific primer L1 (5’GACTCACTATAGGGCACGCGT) and the MLV LTR-specific primer MLV1 (5’-CATGCCTTGCAAAATGGCGTTACTTAAGC) in a touch-down PCR format: 1× 95°C, 5 min; 7× (94°C, 30 sec; 72°C, 1 min); 37× (94°C, 30 sec; 68°C, 1 min); 1× 68°C, 3 min; hold at 12°C. Amplicons of 120-400 bp were gel-excised and purified using a Wizard SV Gel and PCR Clean-Up System (Promega, Australia). Amplicons >400 bp were gel-purified separately and retained for reprocessing. Nested PCR amplifications were carried out under the same master mix reagent concentrations and thermal cycling conditions as the LM-PCR, using 5 µl each of 1 in 10 and 1 in 100 dilutions of the 120-400 bp LM-PCR products in 25-µl reaction volumes, and utilizing a linker-specific primer LNT1 whose 3’ end is complementary to the Tsp509I recognition site (5’-CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGTCGACGGC CCGGGCTGCAATT) and an MLV LTR-specific primer MLVN1 that is recessed four base pairs from the beginning of the proviral 5’ LTR sequence (5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGA 3 Hallwirth et al. Coherence analysis of vector integration patterns For Molecular Therapy TCTGCTTGCCAAACCTACAGGTGGGGTCT). Primers LNT1 and MLVN1 were 5’-tailed with the Illumina GAIIx single-read specific sequences (underlined) required for capture on the oligonucleotide lawn on the GAIIx flow cells and subsequent sequencing-by-synthesis. Nested PCR amplicons were size-selected in the same way as LM-PCR products, but with a size range of 160-500 bp. The lower limit was chosen so that amplicons would contain at least 18 bp of genomic DNA sequence adjacent to the ISs, and the upper limit to facilitate optimal bridge amplification on the Illumina flow cells. LM-PCR and nested PCR steps were repeated for each sample until the available starting material had been processed. Gel-purified LM-PCR amplicons >400 bp were digested in parallel with two other REs having four-base recognition sequences, namely MboI (NEB) and Csp6I (Roche, Australia). Digested fragments were ligated to linkers having overhangs compatible with the respective cut ends and used in LM-PCR amplifications using primers MLVN1 and either LNM1 (5’ CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGTCGACGGCCCGGGCTGCGAT C) or LNC1 (5’ CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGTCGACGGCCC GGGCTGCTA), respectively. Amplification products were size-selected in the same manner as Tsp509I-generated LM-PCR products. Aliquots of all Tsp509I-, MboI- and Csp6Igenerated LM-PCR products were pooled in proportions such that their final relative contributions to each of the junction fragment libraries was in accordance with their estimated proportions within the original LM-PCR products. The junction fragment libraries were sequenced on an Illumina GAIIx platform in a 1×76 bp read format (Genome Institute of Singapore), using a custom sequencing primer recessed by two positions relative to the standard Illumina single-read sequencing primer. 4 Hallwirth et al. Coherence analysis of vector integration patterns For Molecular Therapy Sample code for coherence analysis. # reading the file phc001.times=dlmread('PHC001_full_datset_IS_hg18') phc004.times=dlmread('PHC004_full_datset_IS_hg18') # dividing by 10^9 for each dataset so it fits into the frame phc001_1.times=phc001_1.times/1e+09 phc004.times=phc004.times/1e+09 #defining parameters delay_times=[0 3.0802]; params.Fs=100 params.err=[2 0.0500] params.fpass=[0 50] params.pad=0 params.tapers=[50 99] delay_times=[0 5.01]; # computing coherence datasp1=extractdatapt(phc001,delay_times,1); datasp2=extractdatapt(phc004,delay_times,1); [C1,phi,S12,S1,S2,f,zerosp,confC,phistd,Cerr1]=coherencypt(dat asp1,datasp2,params); #plotting coherence figure; plot_vector(C1,f,'n',Cerr1-Cerr1,'b'); ylim([0 1]); 5 Hallwirth et al. Coherence analysis of vector integration patterns For Molecular Therapy SUPPLEMENTARY TABLES Table S1 Transduction performance under London and Paris SCID-X1 trial conditions Vector preparation MFG-MGMT_1 MFG-MGMT_2 MFG-MGMT_3 MFG-γc(a) PBMC_1(b) PBMC_2(c) Patient BM(d) CD34+ after isolation 62.8% 98.6%(e) 79% Transduction London London London (“L”)(f) Paris (“P”)(f) Paris Final CD34+ 90.29% 95.59% 96.41% 49.30% 37% Transgene+ 16.16% 16.36% 16.64% 64.85% 28% Transgene+ CD34+ 15.04% 15.93% 15.98% 28.81% 10% 1.53× 1.72× 1.74× 2.16× 3.59× Donor cells Proliferation (a) For treatment of SCID-X1 patient. See ref. 12 in main document. (b) Harvested 2005 from pediatric oncology patient; frozen as bulk; selected on day 0. (c) Harvested 1997 from pediatric oncology patient; CD34+ selected and cryopreserved. (d) BM, bone marrow. (e) CD34-positivity after thawing. (f) Transduced cells referred to as L and P in main document. Table S2 Distribution of integration sites (ISs) relative to genic categories Dataset Total ISs TSS-proximal(a) Intragenic(a) Intergenic(a) MRC 300 000 16 399 (5.47%) 125 864 (41.95%) 157 737 (52.58%) P 250 213 66 989 (26.77%) 108 033 (43.18%) 75 191 (30.05%) L 54 431 12 538 (23.03%) 23 356 (42.91%) 18 537 (34.06%) SCID1_Paris 9 852 2 567 (26.06%) 3 804 (38.61%) 3 481 (35.33%) SCID1_London 3 470 995 (28.67%) 1 367 (39.39%) 1 108 (31.93%) (a) Defined in Materials and Methods of the main document. 6 Hallwirth et al. Coherence analysis of vector integration patterns For Molecular Therapy Table S3 Fisher’s exact test (two-tailed) p-values of TSS-proximal integration site count comparisons (from Table S2) MRC P L < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.1173 0.0128 < 0.0001 < 0.0001 P SCID1_Paris SCID1_London L SCID1_Paris 0.0030 Table S4 Fisher’s exact test (two-tailed) p-values of intragenic integration site count comparisons (from Table S2) MRC P L < 0.0001 < 0.0001 < 0.0001 0.0025 0.2558 < 0.0001 < 0.0001 < 0.0001 < 0.0001 P SCID1_Paris SCID1_London L SCID1_Paris 0.4179 Table S5 Fisher’s exact test (two-tailed) p-values of intergenic integration site count comparisons (from Table S2) MRC P P L SCID1_Paris SCID1_London < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 < 0.0001 0.0170 0.0145 0.0107 L SCID1_Paris 0.0003 7 Hallwirth et al. Coherence analysis of vector integration patterns Table S6 Parameters for coherence analysis Parameter Whole genome Chromosome 19 3000 1000 [0 1500] [0 500] err [2 0.0500] [2 0.0500] Pad 0 0 [50 99] [50 99] 109 107 Fs fpass tapers delay_times (divide factor) 8 For Molecular Therapy Hallwirth et al. Coherence analysis of vector integration patterns For Molecular Therapy SUPPLEMENTARY FIGURE LEGENDS Figure S1 Association between annotated genomic features and MLV vector integration sites. Increased integration near the indicated feature, calculated by statistical comparison against matched random controls using the ROC area method (references 6 and 32 in the main manuscript), is shown in red, decreased integration in blue, with the intensity of shading correlating with the degree of departure from random integration. Calculations of statistically significant differences in abundance (relative to random integration), indicated by asterisks, are calibrated against the SCID1_Paris dataset; * p < 0.05, ** p < 0.01, *** p < 0.001. Details of relative integration abundance are available as “Supplementary report - Association of Genomic Features with Integration”. Figure S2 Overrepresentation of MLV vector integration sites at coding genes. Overrepresentation values relative to random sites were calculated for proportions of integration sites falling within ±100 kb of the TSS of each known coding gene. A subset of oncogenes was extracted from this list of genes, leaving “other coding genes” (n = 17 822). Oncogenes associated with hematological malignancies were designated “hematological oncogenes” (n = 89), leaving “other oncogenes” (n = 1 852). Mean overrepresentation values were calculated for each gene category within the two experimental transduction datasets and SCID1_Paris. Mean overrepresentation values were compared for different gene categories within the same datasets, and equivalent gene categories between datasets. All comparisons of mean overrepresentation values showed statistical support of differences (independent ttests, p < 0.05), except where indicated. Error bars indicate the standard errors of the means. 9 Hallwirth et al. Coherence analysis of vector integration patterns For Molecular Therapy SUPPLEMENTARY REPORT Supplementary report - Association of Genomic Features with Integration. This is available as a separate file. Within this report, datasets Fr1 and En2 correspond to datasets P and L in the main manuscript, respectively. Dataset MLV is not relevant to the main manuscript. 10