Supplementary Methods (doc 42K)

advertisement

Supplementary methods

Rationale for recoding of DRB1-DQA1-DQB1 genotypes

The use of a three-locus haplotype code instead of genotypes for HLA DRB1 , -DQA1 and DQB1 is made practically possible because of strong LD, which results in a limited number of haplotypes in any given population. Even though the HLA-DRB1 and -DQB1 loci capture the T1D risk,

1

we also included HLA-DQA1 , as this allowed for more accurate haplotyping and also in effect increased the genotyping rates; if genotypes for either HLA-DRB1 or -DQB1 were missing, they could in most instances be deduced from haplotypes of HLA-DQA1 and the other locus. More importantly, the use of three-locus haplotypes instead of treating each locus individually has two major advantages with regards to conditional analyses: Firstly, this effectively decreases the number of variables, as there is only one conditional “locus” to consider. Secondly, the use of haplotypes accurately preserves the risk information contained in these loci.

This is because the risk conferred by alleles at any individual locus in many cases depends on alleles at the other two, with higher or lower risk for certain combinations

(e.g. DRB1*0404-DQA1*0301-DQB1*0302 confers significantly lower risk than

DRB1*0401- DQA1*0301-DQB1*0302, and is even protective in certain populations 1 ). When also considering that LD patterns from alleles of the individual loci are likely to vary from those of haplotypes spanning all three, the use of haplotypes more effectively addresses the main task in our analyses: to control for secondary association due to LD with the DRB1-DQA1-DQB1 loci.

Phasing of DRB1-DQA1-DQB1 haplotypes

DRB1-DQA1-DQB1 haplotypes were phased by comparison with known Caucasian haplotypes (a total of 30 unique haplotypes were evaluated), using as an aid custommade rules in MS Office Excel combining genotypes from at least two of the DRB1-

DQA1-DQB1 loci ( HLA-DRB1 with -DQB1 or HLA-DQB1 with -DQA1 or both combinations depending on the potential for ambiguity). For 49% of samples, this resulted in only two possible codes per individual. For the remaining samples, one, three or (rarely) four codes were generated. For the samples with one code (19% of total), 2/3 were homozygous individuals, whereas the rest were assigned the other code by visual inspection. For samples with three or four codes (30% and 2%, respectively), the vast majority could be resolved by visual inspection, as only two of the haplotypes could possibly coexist in the same individual. In addition, samples with two codes were inspected for consistency within each individual, resulting in only minor changes. In rare cases, however, real ambiguity persisted, which was either resolved by choosing the most common haplotype (2% of total samples; 94% of these involved possibility for the rare DRB1*07-DQA1*0301-DQB1*0201 haplotype vs. the chosen DRB1*07-DQA1*0201-DQB1*0201 haplotype due to missing genotypes for HLA-DQA1 ), or removed ( n = 4). In addition, 1% of samples

(irrespective of number of generated codes) were assigned the rare haplotype code

“99” for one or both alleles after visual inspection of results, and <1% had to be deduced from genotypes of family members due to incomplete genotyping of the

1

individual in question. After these procedures, which except for the deduced individuals were done mostly without regard to results of the other family members, results were checked for Mendelian consistency. This procedure resulted in removal of the haplotype code for 20 individuals and changes to the haplotype code in 10 individuals. As this represented <0.5% of total samples, and because the high number of individual DRB1-DQA1-DQB1 haplotypes (27 of the 30 possible haplotypes were observed in the dataset before grouping of haplotypes with frequency <1%) should result in high detection rates by Mendelian tests, the total error rate must also have been very low.

Grouping of rare haplotypes

Even though the complexity of conditioning on the HLA-DRB1 , -DQA1 and -DQB1 genes was heavily reduced by treating them as a single locus, a large number of rare haplotypes were present in the T1DGC MHC dataset, each introducing a unique parameter. This frequently resulted in unreliable estimates when applying conditional logistic regression. Therefore, we chose to group haplotypes with a population frequency below 1%. Although these rare haplotypes individually are unlikely to contain much information, grouping approaches may still have unwanted side-effects in global conditional analyses

2

:

Artefacts of rare haplotype and genotype grouping

In our analyses, results for three of the SNPs identified by regression methods in region 3 (rs2076522, rs660895 and rs6457617) appeared to be artefacts directly related to the grouping of rare DRB1-DQA1-DQB1 haplotypes. On the other hand, the conditional allelic tests using haplotype estimates treated each DRB1-DQA1-DQB1 haplotype separately. Except for the tests conditioned on the grouped rare haplotypes themselves, these individual tests were therefore not affected. Hence, this approach served as a good validation method of the results arrived at by the regression analyses, and revealed no effects on any other of the best model SNPs ( P >0.01 for the rare group). Moreover, although HLA-B and -DPB1 showed some association on the grouped rare DRB1-DQA1-DQB1 haplotypes ( Supplementary Table 3 ), this was only seen for certain alleles with a relatively small number of transmitted and nontransmitted haplotypes (T+NT<85). Similarly, the pseudo-alleles HLA-C*99 and -

B*99 (consisting of grouped rare alleles) showed significant association but few observations (T+NT<63 for the results reported in Supplementary Table 3 ). These effects are therefore not likely to have had a large impact on the global regression results.

Since rs2076522 and rs6457617 were included when modelling the best HLA-loci in step 3 ( Table 5 ), we repeated the regression procedure without these two SNPs included. Although the significance level of HLA-C was reduced, the overall results confirmed those reported in Table 5 , with only HLA-C and HLA-B remaining with significant associations ( HLA-A P =0.53; HLA-B P =0.0030; HLA-C P =0.0050; HLA-

DPA1 P =0.10 and HLA-DPB1 P =0.40).

2

In sum therefore, the effects of rare haplotype grouping did not affect the main conclusions of steps 2 and 3 outside region 3.

Haplotype analyses

Transmitted (T) and non-transmitted (NT) haplotypes of test loci together with the

DRB1-DQA1-DQB1 haplotypes were constructed from heterozygous parents using the FAMHAP program.

3 Maximum-likelihood haplotype frequencies of the parental generation were calculated using child information, but irrespective of affection status of the family members, using the EM-algorithm as described in Becker & Knapp

(2004).

3

The haplotype frequencies of the whole sample were then used to compute conditional weights for the possible haplotype assignments per family.

4 Thus, if a family was phase ambiguous, the different possible haplotype assignments together with their transmission parents were weighted in an appropriate way.

For conditional tests of alleles mapped on the DRB1-DQA1-DQB1 haplotypes, we used a variant of the haplotype method (HM),

5; 6

involving comparison of relative frequencies of alleles at a test locus on haplotypes that are identical at a primary locus.

In particular, define the primary locus (in this case DRB1-DQA1-DQB1 ) as A with alleles A i

, i

=1, 2,…, k

A and the test locus as B with alleles B j

, j

=1, 2,…, k

B

. Then, under the null that A defines all the risk, for all A i

and all 1 ≤ k < m ≤ j : f

T

( A i

_ B k

) f

NT

( A i

_ B k

)

 f

T

( A i

_ B m

) f

NT

( A i

_ B m

) where f

T

(.) and f

NT

(.) represent T and NT frequencies (calculated from all haplotypes with a given A i allele), respectively (“

_

” represents the haplotype connection). That is, although the number of T and NT for a given haplotype will differ depending on the risk conferred by A i

, the relative ratio of the T/NT frequencies under the null should stay the same regardless of B allele, for any A i

. Conversely, deviations from this relationship imply that an additional effect is marked by B ( i.e.

additional effect is conferred by B itself or a marker in high LD with B ).

Tests for such deviation were performed for each allele at the test locus in 2x2 contingency tables using Pearson’s χ

2

statistic or Fisher’s exact test, with calculation of odds ratios to indicate the size and direction of any additional effect. This method is similar to the HM-TDT described in Cucca et al.

(2001),

7

except that FAMHAP allows for the inclusion of data from multiple children and allows for missing data, which can improve phase assignment and statistical power substantially.

3

Importantly, sibship dependencies due to linkage were ignored. This was evaluated against the alternative to use the proband dataset, which would have resulted in considerably less statistical power. This issue became especially important when considering haplotypes with more than one marker in addition to DRB1-DQA1-DQB1 , where

T/NT numbers quickly became too small to yield meaningful statistical tests.

Moreover, we applied this test only to haplotypes of markers that showed significant

3

3.

4.

5. evidence for association according to the regression analysis: i.e., only to markers for which association (and not just linkage) was already demonstrated.

Similar to the HM-TDT, our method assumes random mating and Hardy-Weinberg equilibrium in the founder population, due to the use of frequency estimates across all parents (as described above). Therefore, the results of this procedure may be influenced by population stratification, necessitating consideration of tests in defined subpopulations if stratification is expected (hence the testing in the three European subpopulations, in addition to the whole dataset). However, as argued by Dudbridge

(2003), unlike case-control designs, the use of families involves a matched design, which in the face of an unmatched test (e.g. in the presence of population stratification) is likely to lead to reduced power rather than false positives.

8

References

1.

2.

6.

7.

8.

Thomson G, Valdes AM, Noble JA, Kockum I, Grote MN, Najman J et al.

Relative predispositional effects of HLA class II DRB1-DQB1 haplotypes and genotypes on type 1 diabetes: a meta-analysis. Tissue Antigens 2007; 70 (2) :

110-127.

Nejentsev S, Howson JM, Walker NM, Szeszko J, Field SF, Stevens HE et al.

Localization of type 1 diabetes susceptibility to the MHC class I genes HLA-B and HLA-A. Nature 2007; 450 (7171) : 887-892.

Becker T & Knapp M. Maximum-likelihood estimation of haplotype frequencies in nuclear families. Genet Epidemiol 2004; 27 (1) : 21-32.

Becker T & Knapp M. A powerful strategy to account for multiple testing in the context of haplotype analysis. Am J Hum Genet 2004; 75 (4) : 561-570.

Thomson G, Robinson WP, Kuhner MK, Joe S, MacDonald MJ, Gottschall JL et al.

Genetic heterogeneity, modes of inheritance, and risk estimates for a joint study of Caucasians with insulin-dependent diabetes mellitus. Am J Hum

Genet 1988; 43 (6) : 799-816.

Valdes AM & Thomson G. Detecting disease-predisposing variants: the haplotype method. Am J Hum Genet 1997; 60 (3) : 703-716.

Cucca F, Dudbridge F, Loddo M, Mulargia AP, Lampis R, Angius E et al.

The

HLA-DPB1--associated component of the IDDM1 and its relationship to the major loci HLA-DQB1, -DQA1, and -DRB1. Diabetes 2001; 50 (5) : 1200-

1205.

Dudbridge F. Pedigree disequilibrium tests for multilocus haplotypes. Genet

Epidemiol 2003; 25 (2) : 115-121.

4

Download