Supplemental Text Quality control of WGS 1. Quality control of raw

Supplemental Text Quality control of WGS 1. Quality control of raw reads Raw reads contaminated by adapter sequences, or raw reads with >50% bases whose base quality was <5 and with the proportion of N bases >10%, were filtered. Usually, the ratio of adapter contaminated reads was <2% of the total number, the proportion of low quality reads was <8%, and the proportion of N bases was <10%. If not, we considered discarding all the reads from these lanes. 2. Quality control of WGS For 30X re-sequencing, whole genome sequences must meet the following criteria: the mapping rate should be >95%, the mismatch rate should be <1%, GC content should be within the normal range 35-45%, and coverage for ≥4X depth should be >95%. Basic statistics of whole genome sequencing for each individual Sample Mapping rate (%) Mismatch-rate (%) GC-content (%) Coverage≥4X (%) Mean-depth Unaffected member of HMO Family 1 97.14 0.53 41.09 99.13 31.01 Proband of HMO Family 1 96.92 0.58 41.30 99.19 33.52 Proband of HMO Family 3 97.87 0.40 39.75 99.60 28.39 Patient with Dent disease 97.31 0.46 40.23 99.47 30.75 Supplemental Fig. S1 Radiographs showing exostosis lesions in affected HMO individuals. a Right forearm of Family 1 member III-1, showing exostosis in the ulna and bowed forearm conferring restricted rotation (arrow). b Right leg of Family 1 member III-12, showing the tibial exostosis resulting in the destruction of the fibula (arrow). c Pelvic radiograph of Family 1 member I-2, displaying osteoarthritis and necrosis of the femoral head on the right side of the hip (arrow). d Radiograph of Family 2 member II-1, showing the exostosis at the epiphysis of the phalanx in the fourth digit of the right hand (arrow). e Radiograph of Family 2 member II-1, showing multiple exostoses around the left knee joint (arrows). f Radiograph of Family 3 member II-1, revealing the exostoses in the scapula (arrow). Supplemental Fig. S2 Pipeline methods employed to accurately characterize the CNVs using whole genome sequencing data from Families 1 and 2 (red letters denote deleted sequences, orange letters denote micro-mutations, blue letters denote insertions and black letters denote unchanged sequence or reference sequences). Step 1:Prediction of the CNVs using DELLY, Breakdancer and CNVnator software on WGS data, which indicated multiple distinct breakpoints. Step 2: Determination of the breakpoints by alignment of the truncated reads around the breakpoints with reference sequences. The breakpoints were determined using the truncated reads from read-pairs (one read mapped, the other truncated) because these truncated reads had concordant ends near the predicted breakpoint (6 at the 5’ end and 9 at the 3’end). Thus, the tentative breakpoints of this CNV exemplar were determined to be chr11:43,936,139 and chr11:44,438,037. Step 3: Tracking sequences in breakpoints regions and fine-tuning of the breakpoints. (1) Tracking sequences in breakpoints regions. We extracted the 1000-bp flanking sequence at each of the breakpoint ends from the human reference sequence and concatenated the flanking sequences into 2000-bp junction sequences as the reference to be aligned with the patients’ sequences obtained from the WGS data. We extracted all the reads with abnormal insert sizes, unexpected strand orientations and truncations, and ends that mapped to two different chromosomes as well as the read-pairs (one read mapped, the other truncated), and aligned them to the junction sequences using BWA. Then we obtain information on insertions, microhomologies and micro-mutations. (2)Fine-tuning. 1) Construct sequences before fine-tuning. We searched the inserted sequences (GAGAAAAGCATTTGCAAAAA) by BLAT, and found the only position in 157bp downstream of the 5’ breakpoint. Analysis of the patient reads at the 3' end of the deletion revealed a TGA microhomology (in purple box) that could either be assigned to the deleted sequence or to the breakpoint-flanking sequence due to the presence of an insertion at the breakpoint junction. 2) Fine-tuning. By combining the physical positions and flanking sequence at the breakpoint junction, the 'GTATGA' could be located at the 3’ flanking regions of the 20bp insertion because of perfect mapping to the reference sequence. 3) Construct sequences after fine-tuning. Consequently, the 26bp insertion perfectly matched in Chr11:43,936,296-43,936,321 and the precise position of the 3' breakpoint of the CNV was refined to position chr11:44,438,043. 4) Patient sequences. Supplemental Fig. S3 Pipeline methods employed to accurately characterize the CNVs using whole genome sequencing data from Family 3 (red letters denote deleted sequences, orange letters denote micro-mutations, blue letters denote insertions and black letters denote unchanged sequences or reference sequences). Step 1:Prediction of the CNVs using DELLY, Breakdancer and CNVnator software on WGS data, which indicated multiple distinct breakpoints. Step 2: Determination of the breakpoints by alignment of the truncated reads around the breakpoints with reference sequences. The breakpoints were determined using the truncated reads from read-pairs (one read mapped, the other truncated) because these truncated reads had concordant ends near the predicted breakpoint (12 at the 5’ end, 10 at the 3’end). Thus, the breakpoints of this CNV exemplar were determined to be chr11:44,128,440 and chr11:44,198,500. Step 3: Tracking sequences in breakpoints regions. We extracted the 1000-bp flanking sequence at each of the breakpoint ends from the human reference sequence and concatenated the flanking sequences into 2000-bp junction sequences as the reference to be aligned with the patients’ sequences obtained from the WGS data. We extracted all the reads with abnormal insert sizes, unexpected strand orientations and truncations, and ends that mapped to two different chromosomes as well as the read-pairs (one read mapped, the other truncated), and aligned them to the junction sequences using BWA. Consequently, we found a 5bp insertion (TCTTG) within the breakpoint junctions and a CC insertion in the flanking regions of the breakpoint. Supplemental Fig. S4 Pipeline methods employed to accurately characterize the CNVs using whole genome sequencing data from the Dent disease patient (red letters denote deleted sequences, orange letters denote micro-mutations, blue letters denote insertions and black letters denote unchanged sequences or reference sequences). Step 1:Prediction of the CNVs using DELLY, Breakdancer and CNVnator software on WGS data, which indicated multiple distinct breakpoints. Step 2: Determination of the breakpoints by alignment of the truncated reads around the breakpoints with reference sequences. The breakpoints were determined using the truncated reads from read-pairs (one read mapped, the other truncated) because these truncated reads had concordant ends near the predicted breakpoint (5 at the 5’ end, 5 at the 3’end). Thus, the breakpoints of this CNV exemplar were determined to be chrX:49,780,222 and chrX: 49,840,741. Step 3: Tracking sequences in breakpoints regions. We extracted the 1000-bp flanking sequence at each of the breakpoint ends from the human reference sequence and concatenated the flanking sequences into 2000-bp junction sequences as the reference to be aligned with the patients’ sequences obtained from the WGS data. We extracted all the reads with abnormal insert sizes, unexpected strand orientations and truncations, and ends that mapped to two different chromosomes as well as the read-pairs (one read mapped, the other truncated), and aligned them to the junction sequences using BWA. Consequently, we found a 22bp insertion (TACATATAGTGACAGGGAATGG) at the breakpoint junctions. Supplemental Fig. S5 FISH analysis of the cultured blood cells from the proband of HMO Family 1 (III-1). The EXT2 gene signal is shown in red whilst the control signal from the centromeric sequences of chromosome 11 is shown in green. Note the absence of the EXT2 gene signal in one of the chromosome 11 homologues in both metaphase (a) and interphase (b) cells. Supplemental Fig. S6 Identification of CNVs by MLPA (Multiplex Ligation-dependent Probe Amplification) and chromosome microarray analyses. a MLPA electropherogram of the HMO Family 1 proband showing the amplification ratio of all EXT2 probes relative to the reference probes (as well as the EXT1 probes). Identical MLPA electropherogram was observed in the Family 2 proband. b MLPA electropherogram of the HMO Family 3 proband showing a heterozygous deletion of exons 2-8 of EXT2 (defined by probes EXT2-04 to EXT210). The horizontal red line indicates the threshold ratio indicative of a heterozygous deletion. c Chromosome microarray analysis of the boy with Dent disease revealing a deletion involving part of the CLCN5 gene. By the weighted log2 ratio method (upper panel), the copy number of the X chromosome for a normal male corresponds to baseline -0.5 on the scatterplot. By the copy number state method (bottom panel), the copy number of the X chromosome for a normal male is 1.0. The probes revealing a zero copy number indicate a ~50 kb deletion at Xp11.23-p11.22 with a minimum range of 49,790,892-49,840,451 (hg19). Supplemental Table S1 Clinical characteristics of the patients Sex Agea (years) No. of exostoses Family 1-I-2 Female 70 2 Femur Family 1-II-3 Family 1-II-5 Family 1-II-8 Family 1-III-1 Male Male Female Male 48 45 40 26 6 5 4 8 Humerus, tibia, ulna and radius Femur, fibula and radius Femur and humerus Femur, tibia, fibula, humerus, ulna and radius Male Female Male Male Male Male Male 13 7 14 12 45 13 10 7 5 13 15 6 17 6 Femur, tibia, fibula, ulna and radius Femur and rib Femur, tibia, fibula, humerus, ulna and radius Femur, tibia, fibula, humerus, ulna and radius Femur and tibia Femur, tibia, fibula and phalanx Femur, ulna, radius and scapula Sex Age* (years) Male 12 HMO Patient Family 1-III-5 Family 1-III-9 Family 1-III-11 F Family 1-III-12 Family 2- I -1 Family 2-II-1 Family 3-II-1 Dent Disease Patient II-1 a Age at diagnosis. Location of exostoses Renal damage Positive urinary protein, low-molecular-weight proteinuria, hypercalciuria, microscopic hematuria and intermittent hematuria Other clinical phenotypes Hip osteoarthritis, necrosis of femoral head, and scoliosis No Dislocation of radioulnar joint No Forearm deformity and wrist joint dysfunction No No No No No No No Histopathological changes Mild mesangial proliferative glomerulonephritis, focal glomerulosclerosis and crescent formation in glomeruli Surgical therapy Pain No Yes No No No No No No No No No No Yes Yes Yes No Yes No No No No No No No Other phenotypes Mild growth retardation Supplemental Table S2 PCR primers for the generation of EXT2 gene probes used for FISH analysis Forward (5’-3’) Reverse (5’-3’) Fish-1 CGTGGTGTCTCGTTTGGGTTTAAG GATCTGGTTCCCACCGAATGTAAC Fish-2 GGCAATGCTCAAGGTATAGA AGAAATCCAAGGTAGTAACGGT Fish-3 TTAGGCACTGCGAATACTTAGATA GCCCACCACACTAAACCTC Fish-4 CTTTTCTTGAGACCACTTGAACCA CTAGGGCTTGAACATTCCACG Fish-5 TTTCCCTTGTAGTCCACGGCAATAC ACTCCCTCAAACCCCCTCAATGT Fish-6 Fish-7 GGGGAAAGCCTATTGTATCAGT CTCCTGGGGCAGCATTTAAGTA CTTTTTCCTAATCAGCCCACTAC GCCCATTGGATTTTGCTTATCAC Fish-8 AGTGATAGATGGTATTGGACCTAC GGCCTAACTCTTCTGATAACTCT Fish-9 TCTCTTTGTCCCATGTTCTATT GCCCCATTGTAATTCTACG Fish-10 GCAATAGACAAATACTGAAACCTAC GATTCAAGAGATCCGAGCTAC Fish-11 CCTCTGGGCTGAAATGTTACTACTG AATACTCTCATCTGGCTGATCCCTT Fish-12 AGAGGCTGGGTTCAGACTAAATC CAGCATTAATGGGGAAATAGGA Fish-13 ATTTGTTGAACTCTGGTCCATT TTTAGGAATTTCTGGGCTACAG Fish-14 GGCAACATGGACCACATTACTGAT CTGGCTGACCAAGGAGAGTGTCTA Fish-15 CGCCATAGTCCTCACCTACGACC TGAACAAACACCCCACAGAAGATTAAAC Fish-16 GAATCTCCCCTGACACAGTTCTACCT GCAATGAAGAGAGAAATCACTCGC Fish-17 CCTTAAAGGCACACCATAGCAAGT GGCCCCCTCATCACTAATTAAATC Fish-18 Fish-19 CCCTTTGAGTTCATCTTGGAC TTGCTAGGGAGATCGCTAGTTAAGGT TAAACCAGCCAACAGACAGTAGTA TCTCTTCCAAAGGAGCTACGACAGT Fish-20 AAGCAGCATCTCCTGTTCACGTT GACCCTCTGTTTTTCTCTGACAATACC Supplemental Table S3 Primers for long-range PCR and for use, following Sanger sequencing, to confirm the whole-genome sequencing findings with respect to the three pathogenic CNVs* Family Families 1/2 Forward (5’-3’) L1 CGAGGCTTGCTCTCCAACTTCTTAAC F1 AAGAAGTCTGGCAGGATG Reverse (5’-3’) D1 CCTGGGCTCTTCAACTAGGACAGTAAAC R1 GCTGGGATGAGTAGGTC Family 3 L2 F2 TCTTAAAATGTGGTCTACATGGGAACT TCACCGCAACCTCCAC D2 R2 AGTCCAGGGAAGTATCTAATCCTCATC TCCCCTAATAAAGAAC Dent disease patient L3 F3 GGTGGGCTTGTCTGTGTATTAGAAT TGCCCTTTATCTTCCA L3 R3 GTTTCTGTTATTTTGACATGGAATGC CTGCCTCTGACACTTCT nd D:* L and D indicate primers for long-range PCR whilst F and R indicate primers for Sanger sequencing. Supplemental Table S4 LOD scores for chromosome 11 markers in HMO Family 1 LOD Score at θ= Microsatellite markers Zmax 0 0.01 0.05 0.1 0.2 0.3 0.4 D11S4102 0 0 0 0 0 0 0 0 D11S905 4.21 4.15 3.88 3.53 2.76 1.91 0.96 4.21 D11S4191 3.01 2.96 2.77 2.51 1.95 1.32 0.65 3.01 D11S987 -∞ 0.97 1.49 1.55 1.34 0.95 0.47 1.55 D11S4162 -∞ 2.15 2.58 2.53 2.06 1.38 0.61 2.58 D11S1314 -∞ 2.45 2.88 2.83 2.36 1.68 0.87 2.88 Supplemental Table S5 The three templated inserts derived from distant regions of the human genome Sample CNV Breakpoint insertion (5′ to 3′) Origin of inserted sequences Genic region Dent disease patient ChrX deletion TACATATAGTGACAGGGAATGG ChrX:49701701-49701722(+) CLCN5 intron 114816 Chr19 deletion ATTTGGCAGAGGGGGATTTGGCAGGGTCAT AGGACAACAGCGGAGGGAAGGTCAG Chr17:15999543-15999592(-) NCOR1 intron 120099/120098 Ch6 deletion GTCACCCAGTCTGGAGTGCTGT Chr1:10452912-10452933(-), Chr1:187864763-187864784(+), Chr1:201144808-201144829(-), Chr4:53575026-53575047(+), Chr9:26186546-26186567(-) Intergenic region Intergenic region Intergenic region Intergenic region Intergenic region

Supplemental Text Quality control of WGS 1. Quality control of raw

Related documents

Products

Support

Supplemental Text Quality control of WGS 1. Quality control of raw

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib