Dear Prof Jörg Epplen, We would like to thank you and the

advertisement
Dear Prof Jörg Epplen,
We would like to thank you and the reviewers for the constructive criticism of our manuscript.
We have implemented changes according to the reviewers’ comments and feel that our
manuscript has improved. Below we are providing a detailed description of the changes we have
made in accordance to the reviewers’ suggestions. We have also highlighted all major changes in
the manuscript.
Reviewer 1:
We would like to thank the reviewer for their suggestions. We have made changes to our
manuscript in accordance to their suggestions. More specifically:
1) There are a few scattered typos: We have corrected the typos
2) Please add information on the controls including chronic diseases. Table 1 is missing: We
apologize for not including table 1. We have included it with the revised manuscript. Table 1a
includes demographic information on cases and controls. Table 1b includes the genotype
characteristics of cases and controls. Cases and controls were not matched for the presence of
chronic diseases. We have added this in our methods section. Furthermore, in our discussion
section we have added this statement: “Information on chronic diseases was not recorded for
cases and controls with the possibility of discrepancy between the two groups. However FTO has
only shown to play a role in obesity and we controlled for BMI. Therefore such discrepancy
would likely not affect our findings.”
3) Elaborate on how you think these SNPs increase breast cancer risk via a demethylation
process: Recent data has shown that increased FTO expression results in increased food intake,
leading to increased adiposity. Thus, a gain-of-function effect is suggested for the implicated
human allele. Overexpression of Fto caused a dose-dependent increase in body weight and fat
mass[1]. This has been included in the discussion section of our manuscript.
Reviewer 2:
1) Table 1 is absent: We have added table 1. We apologize for the omission.
2) Provide the LD patterns among the four SNPs: We calculated LD among 4 SNPs and also
extracted LD values from HapMap. The results are shown in Table S1.The LD values obtained
from our data were comparable with those from the HapMap project. We found these SNPs were
in strong LD fro Caucasian and Asian samples but had reduced LD for Black samples. Due to the
string LD among SNPs, traditional multiple logistic regression models cannot be used. Thus, we
employed our Bayesian hierarchical logistic models, which can accommodate multiple SNPs
with strong LD in the analysis.
3) The manuscript does not present standard association analyses for the four SNPs individually:
We performed the traditional logistic regression for each SNP separately and presented the
corresponding results in Table 2b. The analysis confirmed that rs1477196 was significantly
associated with the risk of breast cancer through the additive effect after the Bonferroni
correction for the multiple testing.
4) The results shown in Table 3 are not convincing without further information on the correlation
patterns among the predictor variables in logistic regression analyses: This comment is related to
the issue of identifiability for logistic (and other generalized linear) models. There are several
reasons that a classical logistic regression can be non-identifiable (that is, have parameters that
cannot be estimated from the available data or estimates can be unstable): 1) collinear among
predictors, 2) separation, which arises when a predictor or a linear combination of predictors is
completely aligned with the outcome, 3) many predictors, and 4) low frequencies for categorical
predictors. These problems can occur in multiple-SNP and interaction analysis in genetic
association analysis, because a) SNPs are usually in linkage disequilibrium, introducing
correlated variables; b) SNP data often include genotypes with low frequencies that create
predictors with little variation especially for interaction terms; c) Because SNP data are discrete,
separation can be a serious problem in case-control association studies.
Standard methods for overcoming these problems are penalized likelihood regressions or
Bayesian modeling. The key to these approaches is the use of a penalty of model complexity (in
penalized likelihood framework) or continuous prior distributions for genetic effects (in
Bayesian framework) that constrains coefficients to lie in a reasonable range[2,3]. The Bayesian
hierarchical models (or penalized regressions) are identified, and, thus, the resulting estimate is
well defined and has finite variance, even if the original data have problems that would result in
nonidentifiability of the maximum likelihood estimate. The Bayesian method of Gelman et al.[3]
was particularly developed to deal with the above problem and can used in routine data analysis
[2].
In this manuscript, we used the Bayesian method described in Yi and Banerjee [4]and Yi et al.[5]
to perform multiple-SNP and interaction analyses. The method of Yi and Banerjee [4] and Yi et
al. [5] is very similar to Gelman et al. [3], and thus results in well-identified models and produces
stable estimate of coefficients. However, we found that for our data the multiple-SNP epistatic
models are non-identifiable if we use classical logistic regression.
5) Please explain the ROC results here in the context of other studies: In the paper by
Watcholder et al[6] et al a genetic model that included 10 SNPs found in GWAS to be associated
with breast cancer risk produced an AUC of 58.9%. In that study the SNP-only model predicted
risk slightly better than the Gail model. Another study by Mealiffe et al[7] also confirmed that a
SNP-only model produced a higher AUC (58.7%) compared with the Gail model (55.7%). The
AUCs in these studies are somewhat lower compared with the AUC obtained in our study. The
major reasons for this prediction improvement are 1) we used a Bayesian hierarchical model to
fit the data and to predict the disease risk. As described in Gelman et al. [3], a Bayesian
hierarchical model can improve the prediction accuracy; 2) the previous studies have not
included interactions into the predictive models. If genetic interactions are present, adding these
interactions to a predictive model can increase the accuracy of prediction[8-11]. However we do
recognize that this is a single study and we would need to validate our results before we can be
certain of the magnitude of our findings. The above has been added in the discussion section of
our manuscript. The above has been added in the discussion section of our manuscript.
6) Please discuss the control group response rate of 25%: healthy volunteers were approached in
outpatient clinics. It is not uncommon for most of the individuals approached to decline
participation in a genetic study. However the controls used in this study were selected from a
pool of over 5000 individuals. This increases the validity of our study.
7) Address the multiple testing issues from Table 2: We calculated the adjusted p-values in Table
2 using the Bonferroni Correction for the multiple testing. For both the multiple-SNP and singleSNP analysis using the logistic regression, we showed that rs1477196 was significantly
associated with the risk of breast cancer through the additive effect after the Bonferroni
Correction at the nominal level of 0.05.
Reference List
1. Church C, Moir L, McMurray F, Girard C, Banks GT, Teboul L et al.: Overexpression
of Fto leads to increased food intake and results in obesity. Nat Genet 2010, 42: 10861092.
2. Gelman A, Hill J: Data Analysis Using Regression and Multilevel/Hierarchical Models.
Cambridge University Press, New York; 2007.
3. Gelman A, Jakulin A, Pittau MG, Su YS: A weakly informative default prior
distribution for logistic and other regression models. Annals of Applied Statistics 2008,
2: 1360-1383.
4. Yi N, Banerjee S: Hierarchical generalized linear models for multiple quantitative
trait locus mapping. Genetics 2009, 181: 1101-1113.
5. Yi N, Kaklamani VG, Pasche B: Bayesian Analysis of Genetic Interactions in CaseControl Studies, With Application to Adiponectin Genes and Colorectal Cancer
Risk. Annals of Human Genetics (in press) 2010.
6. Wacholder S, Hartge P, Prentice R, Garcia-Closas M, Feigelson HS, Diver WR et al.:
Performance of common genetic variants in breast-cancer risk models. N Engl J Med
2010, 362: 986-993.
7. Mealiffe ME, Stokowski RP, Rhees BK, Prentice RL, Pettinger M, Hinds DA:
Assessment of clinical validity of a breast cancer risk model combining genetic and
clinical information. J Natl Cancer Inst 2010, 102: 1618-1627.
8. Yi N, Kaklamani VG, Pasche B: Bayesian analysis of genetic interactions in casecontrol studies, with application to adiponectin genes and colorectal cancer risk. Ann
Hum Genet 2011, 75: 90-104.
9. Clark AG: Limits to prediction of phenotype from knowledge of genotypes. Limits to
knowledge in evolutionary genetics. Edited by M Clegg et al. Kluwer Academic/Penum
Publishers, New York; 2000:205-224.
10. Moore JH, Williams SM: Epistasis and its implications for personal genetics. Am J
Hum Genet 2009, 85: 309-320.
11. Yi N. Statistical analysis of genetic interactions. Genetics Research . 2011.
Ref Type: In Press
Download