Selecting explanatory variables with the modified version of Bayesian Information Criterion Małgorzata Bogdan Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland in cooperation with J.K.Ghosh, R.W.Doerge, R. Cheng – Purdue University A. Baierl, F. Frommlet, A. Futschik – Vienna University A. Chakrabarti - Indian Statistical Institute P. Biecek, A. Ochman, M. Żak – Wrocław University of Technology Vienna, 24/07/2008 Małgorzata Bogdan Modified BIC Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Małgorzata Bogdan Modified BIC Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Aim – identify factors influencing Y Małgorzata Bogdan Modified BIC Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Aim – identify factors influencing Y Properties of the data base – number of potential factors, m, may be much larger than the number of cases, n Małgorzata Bogdan Modified BIC Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Aim – identify factors influencing Y Properties of the data base – number of potential factors, m, may be much larger than the number of cases, n Assumption of Sparsity - only a small proportion of potential explanatory variables influences Y Małgorzata Bogdan Modified BIC Specific application - Locating Quantitative Trait Loci Małgorzata Bogdan Modified BIC Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus Małgorzata Bogdan Modified BIC Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus Xij - dummy variable encoding the genotype of i-th individual at locus j Małgorzata Bogdan Modified BIC Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus Xij - dummy variable encoding the genotype of i-th individual at locus j Xij ∈ {−1/2, 1/2} Małgorzata Bogdan Modified BIC Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus Xij - dummy variable encoding the genotype of i-th individual at locus j Xij ∈ {−1/2, 1/2} Multiple regression model: Yi = β0 + m X βj Xij + i , j=1 where i ∈ {1, . . . , n} and i ∼ N(0, σ 2 ) Małgorzata Bogdan Modified BIC (0.1) Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus Xij - dummy variable encoding the genotype of i-th individual at locus j Xij ∈ {−1/2, 1/2} Multiple regression model: Yi = β0 + m X βj Xij + i , j=1 where i ∈ {1, . . . , n} and i ∼ N(0, σ 2 ) Problem : estimation of the number of influential genes Małgorzata Bogdan Modified BIC (0.1) Bayesian Information Criterion (1) Mi - i-th linear model with ki < n regressors Małgorzata Bogdan Modified BIC Bayesian Information Criterion (1) Mi - i-th linear model with ki < n regressors θi = (β0 , β1 , . . . , βki , σ) - vector of model parameters Małgorzata Bogdan Modified BIC Bayesian Information Criterion (1) Mi - i-th linear model with ki < n regressors θi = (β0 , β1 , . . . , βki , σ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) – maximize BIC = log L(Y |Mi , θ̂i ) − 12 ki log n Małgorzata Bogdan Modified BIC Bayesian Information Criterion (1) Mi - i-th linear model with ki < n regressors θi = (β0 , β1 , . . . , βki , σ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) – maximize BIC = log L(Y |Mi , θ̂i ) − 12 ki log n If m is fixed, n → ∞ and X 0 X /n → Q, where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. Małgorzata Bogdan Modified BIC Bayesian Information Criterion (1) Mi - i-th linear model with ki < n regressors θi = (β0 , β1 , . . . , βki , σ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) – maximize BIC = log L(Y |Mi , θ̂i ) − 12 ki log n If m is fixed, n → ∞ and X 0 X /n → Q, where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. When n ≥ 8 BIC never chooses more regressors than AIC and is usually considered as one of the most restrictive model selection criteria. Małgorzata Bogdan Modified BIC Bayesian Information Criterion (1) Mi - i-th linear model with ki < n regressors θi = (β0 , β1 , . . . , βki , σ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) – maximize BIC = log L(Y |Mi , θ̂i ) − 12 ki log n If m is fixed, n → ∞ and X 0 X /n → Q, where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. When n ≥ 8 BIC never chooses more regressors than AIC and is usually considered as one of the most restrictive model selection criteria. Surprise ? : - Broman and Speed (JRSS, 2002) report that BIC overestimates the number of regressors when applied to QTL mapping. Małgorzata Bogdan Modified BIC Explanation - Bayesian roots of BIC (1) f (θi ) – prior density of θi , π(Mi ) – prior probability of Mi Małgorzata Bogdan Modified BIC Explanation - Bayesian roots of BIC (1) f (θi ) – prior density of θi , π(Mi ) – prior probability of Mi R mi (Y ) = L(Y |Mi , θi )f (θi )d θi – integrated likelihood of the data given the model Mi Małgorzata Bogdan Modified BIC Explanation - Bayesian roots of BIC (1) f (θi ) – prior density of θi , π(Mi ) – prior probability of Mi R mi (Y ) = L(Y |Mi , θi )f (θi )d θi – integrated likelihood of the data given the model Mi posterior probability of Mi : P(Mi |Y ) ∝ mi (Y )π(Mi ) Małgorzata Bogdan Modified BIC Explanation - Bayesian roots of BIC (1) f (θi ) – prior density of θi , π(Mi ) – prior probability of Mi R mi (Y ) = L(Y |Mi , θi )f (θi )d θi – integrated likelihood of the data given the model Mi posterior probability of Mi : P(Mi |Y ) ∝ mi (Y )π(Mi ) BIC neglects π(Mi ) and uses approximation log mi (Y ) ≈ log L(Y |Mi , θ̂i ) − 1/2(ki + 2) log n + Ri , Ri is bounded in n. Małgorzata Bogdan Modified BIC Explanation - Bayesian roots of BIC (2) neglecting π(Mi ) ≡ assuming all the models have the same prior probability Małgorzata Bogdan Modified BIC Explanation - Bayesian roots of BIC (2) neglecting π(Mi ) ≡ assuming all the models have the same prior probability ≡ assigning a large prior probability to the event that the true model contains approximately m2 regressors Małgorzata Bogdan Modified BIC Explanation - Bayesian roots of BIC (2) neglecting π(Mi ) ≡ assuming all the models have the same prior probability ≡ assigning a large prior probability to the event that the true model contains approximately m2 regressors 200 m=200, 200 models with one regressor, 2 = 19900 models 200 with two regressors, 100 = 9 × 1058 models with 100 regressors Małgorzata Bogdan Modified BIC Modified version of BIC, mBIC (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) Małgorzata Bogdan Modified BIC Modified version of BIC, mBIC (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π(Mi ) = p ki (1 − p)m−ki Małgorzata Bogdan Modified BIC Modified version of BIC, mBIC (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π(Mi ) = p ki (1 − p)m−ki log π(Mi ) = m log(1 − p) − ki log Małgorzata Bogdan Modified BIC 1−p p Modified version of BIC, mBIC (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π(Mi ) = p ki (1 − p)m−ki 1−p log π(Mi ) = m log(1 − p) − ki log p Modified version of BIC recommends choosing the model maximizing 1 log L(Y |Mi , θ̂i ) − ki log n − ki log 2 Małgorzata Bogdan Modified BIC 1−p p mBIC (2) c = mp - expected number of true regressors Małgorzata Bogdan Modified BIC mBIC (2) c = mp - expected number of true regressors m 1 mBIC = log L(Y |Mi , θ̂i ) − ki log n − ki log −1 2 c Małgorzata Bogdan Modified BIC mBIC (2) c = mp - expected number of true regressors m 1 mBIC = log L(Y |Mi , θ̂i ) − ki log n − ki log −1 2 c Standard version of mBIC uses c = 4 to control the overall type I error at the level below 10% Małgorzata Bogdan Modified BIC mBIC (2) c = mp - expected number of true regressors m 1 mBIC = log L(Y |Mi , θ̂i ) − ki log n − ki log −1 2 c Standard version of mBIC uses c = 4 to control the overall type I error at the level below 10% A similar log m penalty appears also in RIC of Foster and George (1994) Małgorzata Bogdan Modified BIC Relationship to multiple testing (1) Orthogonal design: X T X = nI(m+1)×(m+1), Małgorzata Bogdan Modified BIC (1) Relationship to multiple testing (1) Orthogonal design: X T X = nI(m+1)×(m+1), BIC chooses those Xj ’s for which nβ̂j2 σ2 Małgorzata Bogdan > log n Modified BIC (1) Relationship to multiple testing (1) Orthogonal design: X T X = nI(m+1)×(m+1), BIC chooses those Xj ’s for which nβ̂j2 > log n σ2 √ Under H0j : βj = 0, Zj = nβ̂j σ Małgorzata Bogdan ∼ N(0, 1) Modified BIC (1) Relationship to multiple testing (1) Orthogonal design: X T X = nI(m+1)×(m+1), BIC chooses those Xj ’s for which nβ̂j2 > log n σ2 √ Under H0j : βj = 0, Since for c > 0, Zj = 1 − Φ(c) = nβ̂j σ ∼ N(0, 1) φ(c) c (1 + oc ) Małgorzata Bogdan Modified BIC (1) Relationship to multiple testing (1) Orthogonal design: X T X = nI(m+1)×(m+1), (1) BIC chooses those Xj ’s for which nβ̂j2 > log n σ2 √ Under H0j : βj = 0, Since for c > 0, Zj = 1 − Φ(c) = nβ̂j σ ∼ N(0, 1) φ(c) c (1 + oc ) It holds that for large values of n s αn = 2P(Zj > p Małgorzata Bogdan log n) ≈ Modified BIC 2 . πn log n Relationship to multiple testing (2) When n and m go to infinity and the number of true signals remains fixed, the expected number of “false discoveries” is of the rate √ m . n log n Małgorzata Bogdan Modified BIC Relationship to multiple testing (2) When n and m go to infinity and the number of true signals remains fixed, the expected number of “false discoveries” is of the rate √ m . n log n Corollary: BIC is not consistent when √ m n log n Małgorzata Bogdan Modified BIC →∞ . Bonferroni correction for multiple testing : αn,m = Małgorzata Bogdan Modified BIC αn m Bonferroni correction for multiple testing : αn,m = αn m probability of detecting at least one “false positive”: FWER ≤ αn Małgorzata Bogdan Modified BIC Bonferroni correction for multiple testing : αn,m = αn m probability of detecting at least one “false positive”: FWER ≤ αn √ 2(1 − Φ( cBon )) = αmn Małgorzata Bogdan Modified BIC Bonferroni correction for multiple testing : αn,m = αn m probability of detecting at least one “false positive”: FWER ≤ αn √ 2(1 − Φ( cBon )) = αmn cBon = 2 log m αn (1 + on,m ) = (log n + 2 log m)(1 + on,m ) where on,m converges to zero when n or m tends to infinity. Małgorzata Bogdan Modified BIC Bonferroni correction for multiple testing : αn,m = αn m probability of detecting at least one “false positive”: FWER ≤ αn √ 2(1 − Φ( cBon )) = αmn cBon = 2 log m αn (1 + on,m ) = (log n + 2 log m)(1 + on,m ) where on,m converges to zero when n or m tends to infinity. cmBIC = log n + 2 log m c − 1 ≈ log n + 2 log m − 2 log c Małgorzata Bogdan Modified BIC Properties of mBIC 1. FWER ≈ q 2 π √ c n(log n+2 log m−2 log c) Małgorzata Bogdan Modified BIC Properties of mBIC 1. FWER ≈ q 2 π √ c n(log n+2 log m−2 log c) 2. The power of detecting the explanatory variable with βj 6= 0 is given by ! √ √ √ √ nβj n(β̂j − βj ) √ nβj < < cmBIC − 1 − P − cmBIC − σ σ σ √ nβj √ →1 >1−Φ cmBIC − σ Małgorzata Bogdan Modified BIC , Properties of mBIC 1. FWER ≈ q 2 π √ c n(log n+2 log m−2 log c) 2. The power of detecting the explanatory variable with βj 6= 0 is given by ! √ √ √ √ nβj n(β̂j − βj ) √ nβj < < cmBIC − 1 − P − cmBIC − σ σ σ √ nβj √ →1 >1−Φ cmBIC − σ Corollary: Independently on the choice of c mBIC is consistent Małgorzata Bogdan Modified BIC , Properties of mBIC 1. FWER ≈ q 2 π √ c n(log n+2 log m−2 log c) 2. The power of detecting the explanatory variable with βj 6= 0 is given by ! √ √ √ √ nβj n(β̂j − βj ) √ nβj < < cmBIC − 1 − P − cmBIC − σ σ σ √ nβj √ →1 >1−Φ cmBIC − σ Corollary: Independently on the choice of c mBIC is consistent The standard version of mBIC uses c = 4 to control FWER at the level below 10%, when n ≥ 200. Małgorzata Bogdan Modified BIC , Asymptotic optimality of mBIC (1) γ0 - cost of the false discovery, γA - cost of missing the true signals Małgorzata Bogdan Modified BIC Asymptotic optimality of mBIC (1) γ0 - cost of the false discovery, γA - cost of missing the true signals βj ∼ (1 − p)δ0 + pN(0, τ 2 ) Małgorzata Bogdan Modified BIC Asymptotic optimality of mBIC (1) γ0 - cost of the false discovery, γA - cost of missing the true signals βj ∼ (1 − p)δ0 + pN(0, τ 2 ) Expected value of the experiment cost: R = m(γ0 t1 (1 − p) + γA t2 p), where t1 and t2 are type I and type II errors Małgorzata Bogdan Modified BIC Asymptotic optimality of mBIC (1) γ0 - cost of the false discovery, γA - cost of missing the true signals βj ∼ (1 − p)δ0 + pN(0, τ 2 ) Expected value of the experiment cost: R = m(γ0 t1 (1 − p) + γA t2 p), where t1 and t2 are type I and type II errors Optimal rule: Bayes oracle fA (β̂j ) f0 (β̂j ) where fA (β̂j ) ∼ N(0, τ 2 + σ2 n ) > (1 − p)γ0 , pγA 2 and f0 (β̂j ) ∼ N(0, σn ) Małgorzata Bogdan Modified BIC Asymptotic optimality of mBIC (2) Bayes oracle nβ̂j2 σ2 2 γ0 σ 2 + nτ 2 nτ + σ 2 1−p + 2 log > log + 2 log 2 2 nτ σ p γA . Małgorzata Bogdan Modified BIC Asymptotic optimality of mBIC (2) Bayes oracle nβ̂j2 σ2 2 γ0 σ 2 + nτ 2 nτ + σ 2 1−p + 2 log > log + 2 log 2 2 nτ σ p γA Asymptotic Optimality: the model selection rule V is asymptotically optimal if lim n→∞,m→∞ RV =1 . RBO . Małgorzata Bogdan Modified BIC Asymptotic optimality of mBIC (2) Bayes oracle nβ̂j2 σ2 2 γ0 σ 2 + nτ 2 nτ + σ 2 1−p + 2 log > log + 2 log 2 2 nτ σ p γA Asymptotic Optimality: the model selection rule V is asymptotically optimal if lim n→∞,m→∞ RV =1 . RBO Theorem 1 (Bogdan, Chakrabarti, Ghosh, 2008). Under orthogonal design (1) mBIC is asymptotically optimal when limm→∞ mp = s, where s ∈ R. Małgorzata Bogdan Modified BIC Asymptotic optimality of mBIC (2) Bayes oracle nβ̂j2 σ2 2 γ0 σ 2 + nτ 2 nτ + σ 2 1−p + 2 log > log + 2 log 2 2 nτ σ p γA Asymptotic Optimality: the model selection rule V is asymptotically optimal if lim n→∞,m→∞ RV =1 . RBO Theorem 1 (Bogdan, Chakrabarti, Ghosh, 2008). Under orthogonal design (1) mBIC is asymptotically optimal when limm→∞ mp = s, where s ∈ R. Conjecture (Frommlet, Bogdan, 2008). Theorem 1 holds also when βj ∼ (1 − p)δ0 + pFA , where FA has a positive density at 0. Małgorzata Bogdan Modified BIC Computer simulations(1) Setting : n = 200, m = 300, entries of X ∼ N(0, σ = 0.5), k ∼ Binomial (m, p), with p = 1 30 (mp = 10), βi ∼ N(0, σ = 1.5), ε ∼ N(0, 1) and Tukey’s gross error model: ε ∼ Tukey (0.95, 100, 1) = 0.95 ∗ N(0, 1) + 0.05 ∗ N(0, 10). Małgorzata Bogdan Modified BIC Computer simulations(1) Setting : n = 200, m = 300, entries of X ∼ N(0, σ = 0.5), k ∼ Binomial (m, p), with p = 1 30 (mp = 10), βi ∼ N(0, σ = 1.5), ε ∼ N(0, 1) and Tukey’s gross error model: ε ∼ Tukey (0.95, 100, 1) = 0.95 ∗ N(0, 1) + 0.05 ∗ N(0, 10). Characteristics : Power, FDR = P 2 l2 = m j=1 (βj − β̂j ) FP AP , MR = FP + FN, mean value of the absolute prediction error based on 50 additional observations, d Małgorzata Bogdan Modified BIC Computer simulations Table: Results for 1000 replications. noise citerion FP FN Power FDR MR l2 d BIC 13.3 1.84 0.8155 0.5889 15.1480 2.3610 0.9460 E |ε1 | ≈ 0.8 N(0,1) mBIC 0.073 2.97 0.7030 0.0107 3.0410 0.6025 0.8505 rBIC 0.08 3.45 0.6586 0.0116 3.5310 0.8500 0.8687 Tukey(0.95, 100, 1) BIC mBIC rBIC 12.5 0.08 0.1 3.95 6.11 4.29 0.6087 0.3923 0.5806 0.6487 0.0210 0.0162 16.4440 6.1910 4.3910 13.51 4.732 1.597 1.714 1.503 1.298 , E |ε2 | ≈ 1.16 Małgorzata Bogdan Modified BIC Applications for QTL mapping Yi = µ + X βj Xij + j∈I X γuv Xiu Xiv + εi , (u,v )∈U I - a certain subset of the set N = {1, . . . , m}, U - a certain subset of N × N Małgorzata Bogdan Modified BIC Applications for QTL mapping Yi = µ + X βj Xij + j∈I X γuv Xiu Xiv + εi , (u,v )∈U I - a certain subset of the set N = {1, . . . , m}, U - a certain subset of N × N Standard version of mBIC - minimize n log(RSS)+(p+r ) log(n)+2p log(m/2.2−1)+2r log(Ne /2.2−1) p - number of main effects, r - number of interactions, Ne = m(m − 1)/2 Małgorzata Bogdan Modified BIC Further applications for QTL mapping 1. Extending to more complicated genetic scenarios + iterative version of mBIC : Baierl, Bogdan, Frommlet, Futschik Genetics, 2006 2. Robust versions based on M-estimates: Baierl, Futschik, Bogdan, Biecek CSDA, 2007 3. Rank version: Żak, Baierl, Bogdan, Futschik Genetics, 2007 4. Taking into account the correlations between neighboring markers: Bogdan, Frommlet, Biecek, Cheng, Ghosh, Doerge, Biometrics, 2008 Małgorzata Bogdan Modified BIC Real Data Analysis (1) Huttunen et al (2004) - data on the variation in male courtship song characters in Drosophila virilis. Małgorzata Bogdan Modified BIC Real Data Analysis (2) Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived oscillations) of low frequency. Małgorzata Bogdan Modified BIC Real Data Analysis (2) Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived oscillations) of low frequency. Quantitative trait PN - number of pulses in a pulse train. Małgorzata Bogdan Modified BIC Real Data Analysis (2) Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived oscillations) of low frequency. Quantitative trait PN - number of pulses in a pulse train. Data - 24 markers on three chromosomes, n=520 males Huttunen et al (2004) used single marker analysis and composite interval mapping. They found one QTL on chromosome 2, five QTL on chromosome 3 (not sure if there are only 2) and another QTL on chromosome 4. Małgorzata Bogdan Modified BIC Real Data Analysis (2) Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived oscillations) of low frequency. Quantitative trait PN - number of pulses in a pulse train. Data - 24 markers on three chromosomes, n=520 males Huttunen et al (2004) used single marker analysis and composite interval mapping. They found one QTL on chromosome 2, five QTL on chromosome 3 (not sure if there are only 2) and another QTL on chromosome 4. We use mBIC supplied with Haley and Knott regression. We impute the genotypes inside intermarker intervals so the distance between tested positions does not exceed 10 cM. We penalize these imputed locations as real markers. In the results m = 59 and Ne = 1711. Małgorzata Bogdan Modified BIC Real Data Analysis (3) Małgorzata Bogdan Modified BIC Real Data Analysis (4) Zeng et al. (2000) data on the morphological differences between two species of Drosophila, Drosophila simulans and Drosophila mauritana Małgorzata Bogdan Modified BIC Real Data Analysis (4) Zeng et al. (2000) data on the morphological differences between two species of Drosophila, Drosophila simulans and Drosophila mauritana Trait - the size and the shape of the posterior lobe of the male genital arch, quantified by a morphometric descriptor. Małgorzata Bogdan Modified BIC Real Data Analysis (4) Zeng et al. (2000) data on the morphological differences between two species of Drosophila, Drosophila simulans and Drosophila mauritana Trait - the size and the shape of the posterior lobe of the male genital arch, quantified by a morphometric descriptor. n1 = 471, n2 = 491, m = 193, genotypes at neighboring positions are closely correlated, Ne = 18, 528 Małgorzata Bogdan Modified BIC Real Data Analysis, BM ds r ds r s s 0 20 Forward mBIC Zeng 40 60 s s s s s t 0 20 t t s t r s s 40 60 ud q ud t Małgorzata Bogdan r 80 v v v 100 q s r s q s 120 140 d q dt d r ds s s Modified BIC Forward mBIC Zeng s s s Forward mBIC Zeng Real Data Analysis, BS v s t t t t Forward mBIC Zeng 0 20 40 s s s vd v s t 0 20 40 60 s s t 60 s t s v s t v s t v Małgorzata Bogdan s s q r q 80 100 120 mBIC Zeng 140 s ds t s u t s v t t s t t s t t t Modified BIC Forward Forward mBIC Zeng Further work 1. Relaxing the penalty so as to control FDR instead of FWER, expected optimality for a wider range of values of p - with F. Frommlet, J. K. Ghosh, A. Chakrabarti and M. Murawska. 2. Application for association mapping - with F. Frommlet and M. Murawska. 3. Application for GLM and Zero Inflated Generalized Poisson Regression, with M.Zak, C. Czado, V. Earhardt. 4. Application for model selection in logic regression and comparison with Bayesian Regression Trees - with M. Malina, K. Ickstadt, H. Schwender. Małgorzata Bogdan Modified BIC References 1. Baierl, A., Bogdan, M., Frommlet, F., Futschik, A., 2006. On Locating multiple interacting quantitative trait loci in intercross designs. Genetics 173, 1693-1703. 2. Baierl, A., Futschik, A.,Bogdan, M.,Biecek, P., 2007. Locating multiple interacting quantitative trait loci using robust model selection, Computational Statistics and Data Analysis 51, 6423-6434. 3. Bogdan, M., Ghosh, J.K., Doerge, R.W., 2004. Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitative trait loci. Genetics 167, 989–999. 4. Bogdan, M., Frommlet, F., Biecek, P., Cheng, R., Ghosh, J. K., Doerge R. W. 2008 Extending the Modified Bayesian Information Criterion (mBIC) to dense markers and multiple interval mapping. Biometrics, doi: 10.1111/j.1541-0420.2008.00989.x. 5. Broman, K.W., Speed, T.P., 2002. A model selection approach for the identification of quantitative trait loci in experimental crosses. J. Roy. Stat. Soc. B 64, 641–656. 6. George, E.I., McCulloch, R.E., 1993. Variable Selection Via Gibbs Sampling. J. Amer. Statist. Assoc. 88 : 881-889. 7. Żak, M., Baierl, A., Bogdan, M., Futschik, A., 2007. Locating multiple interacting quantitative trait loci using rank-based model selection. Genetics 176, 1845-1854. Małgorzata Bogdan Modified BIC