SUPPLEMENTARY MATERIAL INDEX Tables S1-S10 – Page 2 Figures S1-S3 – Page 13 1 Table S1. Target gene list. ABO ADAMTS13 ADRB1 ADRB2 ADRB3 ADRBK1 ANXA5 APCS APOH ARHGEF1 AXL C1orf114 C1QB C1R C1S CADM1 CALM1 CALM2 CALM3 CALR CASP8AP2 CD34 COL1A1 COL1A2 COL2A1 COL3A1 COL4A1 COL4A2 COL4A3 COL4A3BP COL4A4 COL4A5 COL4A6 COL6A1 COL6A2 COL6A3 CR2 CRP CYP4V2 EDIL3 EPS8L2 F10 F11 F12 F13A1 F13B F2 F2RL1 F2RL2 F2RL3 F3 F5 F7 F8 F8 F9 FCGR2A FCGR2B FGA FGA FGB FGG FTO GAS6 GNAI3 GNAQ GNAS GNB2L1 GP1BA GP1BB GP5 GP6 GP9 HIF1A HIF3A HRH2 HRH2 HTR1A HTR1B HTR1D HTR1E HTR1F HTR2A HTR2B HTR2C HTR3A HTR3B HTR4 HTR6 ICAM1 ICAM2 ICAM3 ICAM4 ICAM5 IL17A IL1A IL1B IL23A IL6 ITFG2 ITGA2 ITGA2B ITGB1 ITGB2 ITGB3 KNG1 LDLR LPA LRP1 MARCKS MERTK MET MFGE8 MTHFR MYBPC3 NAT8B NFKB1 NFKB2 NOS3 ODZ1 OS9 P2RY12 PCSK9 PF4 PKN2 PKN3 PLA2G6 PLAT PLAU PLCB1 PLCB2 PLCB3 PLCB4 PLCG1 PLCG2 PLG PPP1CA PPP1CB PPP2CA PPP2CB PPP3CA PPP3CB PRKCA PRKCD PRKCQ PRKD1 PRKD2 PRKD3 PROC PROS1 PROZ PSEN1 PTGS1 PTGS2 PTK2B PTX3 RASGRP2 RGS7 RND2 SCARB1 SELE SELP SERPINB2 SERPINC1 SERPIND1 SERPINE1 SERPINF2 TACR1 TBX2 TBXAS1 TFPI THBD TLN1 TLR1 TLR10 TLR2 TLR3 TLR4 TNF TYK2 TYRO3 VASP VHL VHLL VWF ZNF544 2 Table S2. Characteristics of the individuals who underwent next-generation sequencing. Individual ID Age, years Gender Number of thrombotic episodes Type of episodes Pulmonary embolism at first DVT BMI, kg/m2 ATIII, % PC, % PS, % PT, INR aPTT, ratio Fibrinogen, mg/dL DVT_P_01 48 F 2 DVT, SVT No 30.1 120 101 102 1.07 0.91 361 DVT_P_02 20 M 2 DVT (2) No 23.3 106 134 188 1.04 0.81 278 DVT_P_03 48 M 1 DVT No 24.6 103 116 105 0.94 1.02 388 DVT_P_04 39 M 1 DVT Yes 24.8 104 77 153 1 1.27 301 DVT_P_05 35 M 1 DVT Yes 24.7 111 88 101 1.01 0.92 279 DVT_P_06 32 F 2 DVT (2) No 19.8 97 86 101 1.05 0.93 324 DVT_P_07 48 M 1 DVT No 21.5 116 63 100 1.03 1.13 304 DVT_P_08 23 F 1 DVT No 30.1 95 81 135 1.06 1.16 374 DVT_P_09 37 M 3 DVT, SVT (2) Yes 34.6 87 78 166 1.02 0.94 365 DVT_P_10 55 F 3 DVT, SVT (2) No 22.9 98 88 146 1 0.9 265 DVT_C_01 45 F / / / 22.7 115 78 98 1.01 1.00 374 DVT_C_02 25 M / / / 24.4 96 81 116 1.15 0.97 229 DVT_C_03 48 M / / / 32.2 96 129 135 0.98 1.02 235 DVT_C_04 39 M / / / 25.3 101 72 100 1.10 1.14 206 DVT_C_05 37 M / / / 24.2 115 135 145 1.00 0.99 208 DVT_C_06 34 F / / / 29.6 109 141 121 1.03 0.96 366 DVT_C_07 46 M / / / 27.1 87 156 123 0.92 0.98 311 DVT_C_08 25 F / / / 18.7 111 118 97 0.97 0.97 244 DVT_C_09 40 M / / / 28.1 106 113 124 0.92 1.01 207 DVT_C_10 56 F / / / 21.5 103 89 99 1.02 0.98 285 DVT_C_11 48 F / / / 26.2 93 110 108 0.90 1.00 371 DVT_C_12 50 F / / / 22.0 102 100 95 0.97 1.06 272 Type of episodes reports the type of thrombotic episode patient’s history was positive for, in parentheses is reported the number of episode of that type. DVT indicates deep vein thrombosis; SVT, superficial vein thrombosis; BMI, body mass index; ATIII, antithrombin; PC, protein C; PS, protein S; PT, prothrombin time; INR, international normalized ratio; aPTT, activated partial thromboplastin time. 3 Table S3. Individual sequence and coverage statistics. Statistics DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT_ DVT_ _P_0 _P_0 _P_0 _P_0 _P_0 _P_0 _P_0 _P_1 _C_0 _C_0 _C_0 _C_0 _C_0 _C_ _C_0 _C_0 _C_0 _C_1 _C_1 _C_1 P_01 P_03 2 4 5 6 7 8 9 0 1 2 3 4 5 06 7 8 9 0 1 2 Raw Mb 532 592 525 422 538 513 194 471 147 367 203 544 177 472 328 488 561 471 210 548 183 Unique Mb Unique % On –target Average Coverage 1x Cov. 10x Cov. 20x Cov. 40x Cov. 325 361 326 266 305 342 293 463 176 644 222 472 176 592 186 527 173 748 230 501 175 519 197 61% 61% 62% 7% 6% 7% 63% 39% 37% 40% 41% 39% 39% 60% 63% 61% 62% 40% 36% 39% 33% 34% 32% 37% 39% 7% 7% 7% 7% 8% 6% 7% 7% 7% 7% 7% 7% 6% 7% 6% 6% 5% 7% 7% 62 62 60 53 34 33 31 28 33 32 63 56 62 58 33 38 31 33 28 33 32 34 99% 99% 99% 98% 98% 98% 98% 98% 98% 98% 99% 99% 99% 99% 98% 98% 98% 98% 99% 98% 98% 98% 96% 96% 96% 95% 88% 88% 87% 87% 89% 88% 96% 95% 96% 95% 89% 91% 87% 89% 85% 89% 89% 90% 93% 93% 92% 91% 74% 75% 70% 68% 74% 73% 93% 91% 93% 92% 74% 80% 71% 74% 66% 75% 74% 76% 83% 84% 82% 76% 39% 39% 33% 27% 39% 35% 84% 79% 83% 80% 37% 48% 35% 37% 26% 38% 35% 40% 4 Table S4. General sequence and coverage statistics. Statistics Average Min Max Raw Mb 523 367 748 Unique Mb Unique % On –target Average Coverage 1x Cov. 10x Cov. 20x Cov. 40x Cov. 236 46% 7% 42 98% 91% 80% 53% 147 32% 5% 28 98% 85% 66% 26% 361 63% 8% 63 99% 96% 93% 84% 5 Table S5. Individual single nucleotide variant statistics. Type of variant heterozygous homozygous Ratio Het/Hom Non-coding Coding syn nsyn Ratio S/NS nonsense dbSNP129 not in dbSNP129 %Novel Ti/Tv TOT SNVs DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DVT DV DVT DVT DVT DVT DVT DVT DVT DVT DVT _P_ _P_ _P_ _P_ _P_ _P_ _P_ _P_ _P_ _P_ _C_ _C_ T_C _C_ _C_0 _C_0 _C_0 _C_0 _C_0 _C_1 _C_1 _C_1 01 02 03 04 05 06 07 08 09 10 01 02 _03 04 5 6 7 8 9 0 1 2 367 324 356 326 292 315 254 275 277 264 366 339 359 344 283 273 260 262 267 311 324 272 141 187 166 153 138 127 136 143 145 172 178 165 151 164 150 155 154 160 140 157 116 158 2.60 1.73 2.14 2.13 2.12 2.48 1.87 1.92 1.91 1.53 2.06 2.05 2.38 2.10 1.89 1.76 1.69 1.64 1.91 1.98 2.79 1.72 258 246 238 233 166 174 164 175 170 188 254 235 248 232 184 194 172 186 178 197 183 178 250 265 284 246 264 268 226 243 252 248 290 269 262 276 249 234 242 236 229 271 257 252 150 153 159 145 158 156 135 155 153 148 167 170 157 176 146 136 142 139 132 166 154 145 100 112 125 101 106 112 91 88 99 100 123 99 105 100 103 98 100 97 97 105 103 107 1.39 1.48 1.76 1.55 1.48 1.36 1.72 1.50 1.76 1.42 1.39 1.42 1.43 1.36 1.58 1.50 1.36 1.50 1.37 1.27 1.44 1.49 0 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 2 1 0 469 461 477 429 404 412 366 390 396 406 489 464 461 462 398 404 382 392 387 442 419 405 39 50 45 50 26 30 24 28 26 30 55 40 49 46 35 24 32 30 20 26 21 25 8 10 9 10 6 7 6 7 6 7 10 8 10 9 8 6 8 7 5 6 5 6 2.60 2.28 2.81 2.57 2.71 2.71 2.82 2.73 3.06 2.60 2.32 2.68 2.64 2.74 3.01 2.72 2.66 2.94 3.42 3.03 2.96 2.55 508 442 508 433 428 414 422 407 468 440 430 511 522 479 430 390 418 422 436 544 504 510 6 Table S6. General single nucleotide variant statistics. Type of variant heterozygous homozygous Ratio Het/Hom Non-coding Coding syn nsyn Ratio S/NS nonsense dbSNP129 not in dbSNP129 %Novel Ti/Tv TOT SNVs Average Range 305 153 2 202 255 152 103 1.48 0 423 34 7 2.75 458 254-367 116-187 2-3 164-258 226-290 132-176 88-125 1.27-1.76 0-2 366-489 20-55 5-10 2.28-3.42 390-544 Table S7. Indel statistics. Type of variant Average Min Max insertions deletions 3 5 0 1 7 10 homozygous heterozygous 1 7 0 2 3 13 non-coding coding 7 1 1 0 14 2 frameshift in frame 1 0.2 0 0 2 1 total 8 2 15 7 Table S8. Variants present in human gene mutation database, HGMD®. Chromo Coordinate some Minor Allele Major Allele Gene Functional annotation dbSNP Associated disease Association with thrombotic disease Association with DVT or DVTassociated phenotype chr1 55301775 G A PCSK9 Missense rs505151 Atherosclerosis, severity, association with Yes No chr1 167765599 C T F5 Missense rs6030 Thrombosis ? Yes Yes chr1 167778379 C T F5 Missense rs4524 Thrombosis, increased risk, association with Yes Yes chr1 167788473 C T F5 Missense novel Thrombosis ? Yes Yes chr1 167831970 A C SELP Missense rs6133 Atopy, increased risk, association with No No chr1 167832937 C T SELP Missense novel Higher platelet SELP measures, association with No No chr1 195297644 C T F13B Missense rs6003 Myocardial infarction, risk, association with Yes No chr1 205694316 C T CR2 Regulatory rs3813946 Increased transcriptional activity, association No No chr4 155731347 T C FGA Regulatory rs2070011 Venous thromboembolism, suscep., association with Yes Yes chr4 187357205 C A CYP4V2 Missense rs13146272 Deep vein thrombosis, reduced risk, association with Yes Yes chr5 148186633 A G ADRB2 Missense rs1042713 Asthma, nocturnal, association with No No chr5 148186666 G C ADRB2 Missense rs1042714 Obesity, association with No Yes chr5 176769138 A G F12 Regulatory rs1801020 Premature myocardial infarction, association with No No chr7 93881175 C G COL1A2 Missense rs42524 Intracranial aneurysm, suscept., assoc. with No No chr7 150327044 T G NOS3 Missense rs1799983 Coronary spasm, association with Yes No 8 Table S8. (continued) Chromo Coordinate some Minor Allele Major Allele Gene Functional annotation dbSNP Associated disease Association with thrombotic disease Atherosclerosis, severity, association with Yes Association with DVT or DVTassociated phenotype No chr1 55301775 G A PCSK9 Missense rs505151 chr1 167765599 C T F5 Missense rs6030 Thrombosis ? Yes Yes chr1 167778379 C T F5 Missense rs4524 Thrombosis, increased risk, association with Yes Yes chr1 167788473 C T F5 Missense novel Thrombosis ? Yes Yes chr1 167831970 A C SELP Missense rs6133 Atopy, increased risk, association with No No chr1 167832937 C T SELP Missense novel Higher platelet SELP measures, association with No No chr1 195297644 C T F13B Missense rs6003 Myocardial infarction, risk, association with Yes No chr1 205694316 C T CR2 Regulatory rs3813946 Increased transcriptional activity, association No No chr4 155731347 T C FGA Regulatory rs2070011 Venous thromboembolism, suscep., association with Yes Yes chr4 187357205 C A CYP4V2 Missense rs13146272 Deep vein thrombosis, reduced risk, association with Yes Yes chr5 148186633 A G ADRB2 Missense rs1042713 Asthma, nocturnal, association with No No chr5 148186666 G C ADRB2 Missense rs1042714 Obesity, association with No Yes chr5 176769138 A G F12 Regulatory rs1801020 Premature myocardial infarction, association with No No chr7 93881175 C G COL1A2 Missense rs42524 Intracranial aneurysm, suscept., assoc. with No No chr7 150327044 T G NOS3 Missense rs1799983 Coronary spasm, association with Yes No 9 Table S9. Nonsynonymous variants in coagulation genes. Annotations and allele counts in the next generation sequencing experiments are reported. Gene Chrom osome FGA chr4 FGB chr4 F2 F3 chr11 chr1 F5 chr1 F7 chr13 Coordinate Referenc e allele Variant allele 155726496 155726824 155727010 155727040 155706593 155711209 46701579 94768650 167750185 167751391 167765599 167776742 167777353 167778179 167778358 167778379 167778502 167785736 167788477 167808137 112818013 112820770 112821160 C C T T C G C C T A T G C T T T T C A C G G G T T A C T A T T C G C A A C C C G T G G A A A Transcript ID Protein change NM_000508 NM_005141 NM_000506 NM_001993 NM_000130 NM_000131 p.R512K p.A403T p.T341S p.T331A p.P100S p.R478K p.T165M p.G281E p.D2222G p.M2148T p.M1764V p.P1404S p.S1200I p.K925E p.H865R p.K858R p.N817T p.R513K p.M413T p.D107H p.G157S p.R283Q p.R413Q dbSNP129 novel novel novel rs6050 rs2227434 rs4220 rs5896 rs3789683 rs6027 rs9332701 rs6030 rs9332608 novel rs6032 rs4525 rs4524 rs6018 rs6020 rs6033 rs6019 novel novel rs6046 1000Genomes CEU population, AF not present not present not present 0.217 not present 0.225 0.083 not present 0.033 0.017 0.25 0.042 not present 0.225 0.225 0.225 0.025 not present 0.033 0.05 not present not present 0.1 SIFT Ben Ben Ben Ben Dam Ben Dam Ben Dam Dam Ben Dam Dam Ben Ben Ben Dam Ben Ben Dam Ben Ben Ben Polyp Alleles Alleles hen 2 cases controls Ben Ben Ben Pod Ben Ben Pod Ben Prd Prd Ben Ben Pod Ben Ben Ben Ben Ben Ben Ben Prd Pod Ben 0 0 1 9 1 3 1 0 2 0 4 1 1 2 3 3 2 0 2 3 1 0 3 1 1 0 4 1 9 1 1 4 1 7 0 0 4 4 3 6 1 4 1 0 1 1 10 Table S9. (continued) Gene Chrom osome F8 F9 chrX chrX F12 chr5 F13A chr6 F13B chr1 Coordinate Referenc e allele Variant allele 153811479 138460946 176763563 176763842 176764432 176764772 6097136 6097139 6119865 6263794 195292774 195292912 195297644 G A G G C G C C G C T A C C G C A G C G T A A A G T Transcript ID NM_000132 NM_000133 NM_000505 NM_000129 NM_001994 Protein change p.D1260E p.T194A p.P385A p.P327S p.A207P p.L140V p.E652Q p.V651I p.P565L p.V35L p.E388V p.I342T p.R115H 1000Genomes Polyp Alleles Alleles CEU SIFT hen 2 cases controls population, AF rs1800291 not available Ben Ben 0 2 rs6048 not present Ben Ben 1 5 novel not present Dam Prd 0 1 novel not present Ben Ben 1 0 rs17876030 0.008 Ben Pod 1 2 rs35515200 0.017 Ben Pod 1 0 rs5988 0.233 Ben Ben 4 5 rs5987 0.042 Ben Ben 0 2 rs5982 0.217 Ben Ben 4 5 rs5985 0.2 Ben Ben 5 4 rs5991 not present Ben Pod 1 0 rs17514281 0.008 Dam Ben 0 1 rs6003 0.058 Ben Ben 1 2 dbSNP129 The 1000Genomes CEU population field reports the annotation of variants in the 1000Genomes database; in case the variant was present, the allele frequency of the variant in the CEU population is reported. In SIFT and Polyphen 2 annotation results, Ben indicates predicted benign; Dam, potentially damaging according to SIFT; Pod, possibly damaging according to Polyphen 2, Prd, probably damaging according to Polyphen 2. 11 Table S10. Novel missense variants identified during replication. Ref Base Var Base 155726925 C T p.S369N 155726929 C T p.G368R 155726935 C T 155727054 G T 155727127 G C p.P302A 155727156 G A p.A292V 80474389 C T Chromosome Coordinate Chr4 chr16 Gene FGA PLCG2 Functional effect p.E366K p.S326Y p.P236L 12 Figure S1. Patient selection flowchart. 2139 referred for lower limb DVT 1765 available for inclusion 730 idiopathic DVT 374 withdrawn consent or DNA not available 1035 secondary DVT Selection criteria (see main text) 11 (of 42 eligible) selected for next-generation sequencing 13 Figure S2. Nxtgen2plink.rb workflow summary. A) Nxtgen2plink.rb WORKFLOW 1) The program generates a variable_sites file with all sites at which a variant was detected at least in one individual in the entire cohort 2) For each individual 3 files are generated: calls – good quality variants in a modified .pileup format filtered – variants eliminated after calling (StrandBias or AlleleBalance) pileup – coverage for all the .variable_sites at which a variant was not present in the individual SNP GENOTYPING B) 3) The program interrogates the individual files to generate the individual genotypes at all the variable_sites. For each variable site: The site is in the calls file Yes There’s a good variant! Generate Het or Hom variant genotype Yes Genotype uncertain at the site! Generate missing genotype ≥8X No variant at the site! Generate wild-type genotype from the reference genome No The site is in the filtered file No Assess coverage at the site in the pileup file <8X Insufficient coverage! Generate missing genotype 14 Figure S2. (continued) C) OUTPUT 4) The phenotypic information (gender, case/control status) is included in the output: PATIENT_ID GENDER DVT_P_9833 1 DVT_NC_128 1 CASE/CTRL_STATUS 1 0 SNP_1_chr1_12112837 AA 00 SNP_2_chr3_1236167434 GT GG ... … … 5) The output file is used to generate PLINK-compatible files. PLINK is then used for all sorts of association analyses: -Calculate MAFs (in the cohort and in case/control groups) -Genotype/phenotype association analysis (can be restricted to variants in certain MAF range or with certain genotype missingness) -Calculate missing genotypes per individual, per variant, overall. -Assess relatedness between individuals and population stratification 15 Figure S3. Representative coverage histograms. Coverage (x-axis) is plotted against number of reads (y-axis). DVT_P_01 DVT_P_02 14000 14000 12000 12000 10000 10000 8000 8000 6000 6000 4000 4000 2000 2000 0 0 0 20 40 60 80 100 120 0 DVT_P_03 DVT_P_04 14000 14000 12000 12000 10000 10000 8000 8000 6000 6000 4000 4000 2000 2000 0 20 40 60 80 100 120 20 40 60 80 100 120 0 0 20 40 60 80 100 120 0 16