Supplemental Methods for the article: CHEK2*1100delC homozygosity in the Netherlands – prevalence and risk of breast and lung cancer Petra E.A. Huijts1, Antoinette Hollestelle2, Brunilda Balliu3, Jeanine J. Houwing-Duistermaat3, Caro M. Meijers1, Jannet C. Blom2, Bahar Ozturk2, Elly M.M. Krol-Warmerdam4, Juul Wijnen5, Els M.J.J. Berns2, John W.M. Martens2, Caroline Seynaeve2, Lambertus A. Kiemeney6, Henricus F. van der Heijden7, Rob A.E.M. Tollenaar4, Peter Devilee1, Christi J. van Asperen5. 1 Department of Human Genetics, Leiden University Medical Center, Leiden 2 Department of Medical Oncology, Erasmus University Medical Center, Rotterdam 3 Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden 4 Department of Surgery, Leiden University Medical Center, Leiden 5 Department of Clinical Genetics, Leiden University Medical Center, Leiden 6 Department of Epidemiology, Biostatistics & HTA, Radboud University Medical Center, Nijmegen, The Netherlands 7 Department of Pulmonary Diseases, Radboud University Medical Center, Nijmegen, The Netherlands Corresponding author: C.J. van Asperen, MD, PhD Postal address: Department of Clinical Genetics, Postbus 9600, 2300 RC Leiden, the Netherlands. E-mail address: asperen@lumc.nl Telephone: 003171 526 6090 Fax: 003171 526 6749 Statistics The program R 2.14.1 was used to conduct the statistical analysis. In order to use the information for all the studies available, we used a combined approach1. We used a joint likelihood approach (Lc) to model the cases (the sporadic breast cancer cohort) and the controls from the blood bank, meaning that we model jointly the disease status (Y) and genotypes (Gc) of individuals. Lc consists of two parts: the disease probability of an individual conditional on his genotype P(Y=1|Gc) and the distribution of the genotypes P(Gc). Thus Lc= P(Y=1|Gc) P(Gc). The probability of disease P(Y=1|Gc) for each individual is modeled using a logistic regression. From this part the OR for the homozygous and heterozygous individuals are estimated. The genotype distribution P(Gc) is modeled assuming Hardy-Weinberg equilibrium. To protect against deviations from Hardy-Weinberg equilibrium we use the disease prevalence p=5% to put a different weight on the genotype distributions of cases and controls. To test the robustness of our method for different values of prevalence we repeated the analysis for prevalence values between 1% and 10%. Similar results were obtained with prevalences between 1% and 10%. To obtain efficient estimates of the allele frequency and of the genotypic relative risks, in the model (Lu) we included the individuals with unknown disease status, that is individuals from the CF cohort and individuals from families with hereditary disease unrelated to cancer, not affected by the disease running in their family. Since these individuals have been collected as controls for studies other than breast cancer, we do not have exact information about their disease status regarding breast cancer. Since we cannot treat them as controls, we cannot use them to estimate the OR. However, we can use the information they contain regarding the allele frequency. This model consists only of the genotype distribution P(Gu) where Gu are the genotypes of these individuals. To model this genotype distribution we assume Hardy-Weinberg equilibrium. The two models (Lc and Lu) are then combined to obtain a common model for all the data sets together. Efficient estimates of all parameters are obtained by using this method, also this method enables estimation of the OR of the homozygotes by modeling the number of homozygotes using the allele frequency. Reference List 1. Brunilda Balliu, Roula Tsonaka, Diane van der Woude, Stefan Boehringer, Jeanine J. Houwing-Duistermaat. Combining Family and Twin Data in Association Studies to Estimate the Noninherited Maternal Antigens Effect. Genet. Epidemiol. 2012 DOI10.1002/gepi.21667