Tutorial III LOD Scores, mQTLs and SNP Marker Analysis Steve Horvath, Anatole Ghazalpour, Sud Doss, Bin Zhang Correspondence: shorvath@mednet.ucla.edu, http://www.ph.ucla.edu/biostat/people/horvath.htm Abstract This tutorial is the direct sequel of tutorial II The tutorials and data files can be found at the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/MouseWeight/ More material on weighted network analysis can be found here http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ The data and biological implications are described in Anatole Ghazalpour, Sudheer Doss, Bin Zang, Susanna Wang,Eric E. Schadt, Thomas A. Drake, Aldons J. Lusis, Steve Horvath (2006) Integrating Genetics and Network Analysis to Characterize Genes Related to Mouse Weight. PloS Genetics. We provide R code that shows a) how to relate clinical traits and expression profiles to genetic markers Quantitative trait locus, LODscore, QTL, linkage analysis, allelic association analysis, mQTL b) how to produce Figure 3 in our article (LOD score curve and mQTL graph) c) how to regress a phenotype on multiple markers with the fitqtl package (Broman et al 2003). The purpose of this tutorial is to demonstate how the body weight QTL, expression QTLs and Module QTLs (mQTLs) were calculated. We make use of the R/qtl package (Broman et al 2003) which is explained in Dr Broman’s tutorial at the following webpage. http://www.biostat.jhsph.edu/~kbroman/qtl/rqtltour.pdf To avoid duplication of effort, this tutorial is rather terse with regard to QTL analysis. Weight QTL analysis For bodyweight QTL analysis, in summary, we first read the genotype and phenotype data into the R software using read.cross function (make sure the R working directory is set to the correct directory): setwd("C:/Documents and Settings/shorvath/My Documents/ADAG/AnatoleGhazalpour/MOUSEWEIGHT/TUTORIALS/ThirdTutorial") library(qtl) # now we read in the genetic marker data from the F2 mouse cross (female BXHxC3H F2crossdata <- read.cross("csvr","", "F2mouseBXHxC3HGenotypes.csv") 1 # The following data frame contains the LOD scores for each gene (expression) across # the SNP markers. ExpressionLodScore=read.csv("ExpressionLODscoreFemaleLiver.csv",header=T) SNPtable=read.csv("SNPMarkerLocusTranslationTable.csv") colorh1=ExpressionLodScore$ModuleFemaleLiver #We defined a module quantitative trait locus (mQTL) as the locus with a significant #enrichment for eQTL of the genes within a predetermined gene module. # Here we specify the module of interest whichmodule="blue" LODscoreThreshold=2 NoSignifModuleGenes=apply(ExpressionLodScore[colorh1==whichmodule, -c(1:3)]> LODscoreThreshold,2,sum) NoSignifGenes=apply(ExpressionLodScore[, -c(1:3)]> LODscoreThreshold,2,sum) par(mfrow=c(2,1) ) plot(NoSignifGenes,type="h", col="black", xlab="Marker Number",main=paste("No. of network genes with epression LOD score >", LODscoreThreshold) ) plot(NoSignifModuleGenes,type="h", col=whichmodule, xlab="Marker Number", main= paste("No of", whichmodule, " module genes with expression LOD score >", LODscoreThreshold)) # This data frame contains the no. of significant genes for each SNP (rows) NoSignifGenesData=data.frame(UCSC.Name=names(NoSignifModuleGenes), NoSignifModuleGenes= as.vector(NoSignifModuleGenes), NoSignifGenes= as.vector(NoSignifGenes) ) # This outputs the SNPs with more for which more than 100 module genes # have a LOD score > LODscore threshold NoSignifGenesData[NoSignifGenesData$ NoSignifModuleGenes>100,] 2 UCSC.Name NoSignifModuleGenes NoSignifGenes 112 p45066 117 512 113 p46356 117 512 114 p45960 117 512 3 #To calculate the weight QTL we first run a genome scan using the scanone function: F2crossdata <- calc.genoprob(F2crossdata, error.prob=0.01) F2crossdata <- calc.genoprob(F2crossdata, error.prob=0.01) out.hk <- scanone(F2crossdata, method="hk") #To view the highest LOD scores for each chromosome type: summary(out.hk) p45504 p45236 p44989 p44990 p45558 p44960 p45323 p45103 p45722 p45653 p45738 p45039 p46019 p45050 p44593 p45947 p45618 p44804 p45500 chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 pos 142.7 144.3 160.0 155.2 118.5 125.6 136.7 13.7 36.3 40.6 9.4 79.6 71.0 116.8 78.2 3.6 26.4 73.8 50.4 lod 3.03 0.63 1.85 1.03 3.39 0.36 1.07 0.70 1.55 0.96 1.02 0.87 1.16 0.73 3.58 1.28 2.13 1.02 4.07 AA 39 39 36 36 35 38 38 38 40 38 37 40 36 40 41 40 35 40 40 AB 39 38 38 39 39 39 39 39 37 37 38 38 39 37 38 37 39 37 39 BB sigma 34 5.8 37 6.1 40 5.9 38 6.0 41 5.8 37 6.1 36 6.0 37 6.0 38 6.0 40 6.0 40 6.0 37 6.0 38 6.0 38 6.0 36 5.7 38 6.0 39 5.9 38 6.0 35 5.7 4 #To view the LOD score curve of body weight par(mfrow=c(1,1)) plot(out.hk,gap=0,col="black") 5 # Now we merge the 2 data sets datmerge=merge(NoSignifGenesData, data.frame(UCSC.Name= dimnames(out.hk)[[1]], out.hk), by.x="UCSC.Name", by.y="UCSC.Name",sort=F) dim(NoSignifGenesData) dim(out.hk) dim(datmerge) names(SNPtable) datmerge=merge(SNPtable, datmerge, by.x="UCSC.Name", by.y="UCSC.Name",sort=F) names(datmerge) dim(datmerge) attach(datmerge) 6 Since we were interested in studying the genetics of modules as opposed to the genetics of the entire network, when searching for mQTLs, we focused on module-specific eQTL hot spots. Toward this end, we used the Fisher exact test to determine whether the proportion of module genes that map to the eQTL hotspot was significantly higher than that of the 8,000 network genes. While we used the 8000 most varying genes in our paper, we only have 3421 in this tutorial. The following code shows how to pick mQTLs based on the 3421 genes. Naturally, the results will be slightly different from the analysis reported in Table 1 of the paper since it was based on the 8000 genes. # For the i-th SNP, # the following table reports the number of significant genes (first column) versus # non-significant genes (second column) for the module genes (first row) # and the non-module genes (second row). fisherpvalue=rep(NA, length(NoSignifGenes) ) for (i in c(1: length(NoSignifGenes) ) ) { tab1=matrix(NA, nrow=2,ncol=2) tab1[1,1]= NoSignifModuleGenes[i] tab1[1,2]=sum(colorh1==whichmodule)- tab1[1,1] tab1[2,1]= NoSignifGenes[i]- NoSignifModuleGenes[i] tab1[2,2]= length(colorh1)- sum(colorh1==whichmodule)-tab1[2,1] fisherpvalue[i]=fisher.test(tab1)$p.value } # This is the proportion of the number of significant module genes by no. of signif. genes. enrich1=NoSignifModuleGenes/ NoSignifGenes #To select the mQTLs based on the 3421 genes (not the 8000 genes mentioned in the paper), we use the following 3 criteria # this ensures that the eQTL hotspot (mQTL) is module specific (2 sided test) 1) fisherpvalue<0.0005 # This ensures that at least 20% of the significant genes are in the module at the mQTL 2) enrich1>.2 # this requires that at least 30 module genes map to the mQTL (LOD score threshold 2) 3) NoSignifModuleGenes>=30 selectmQTL=fisherpvalue<0.0005 & enrich1>.2 & NoSignifModuleGenes>=30 7 datout=data.frame(chromo=datmerge[,3], datmerge[,c(1:2,7,8)],FisherP=signif(fisherpvalue,2), Enrich=signif(enrich1,2) )[selectmQTL,] # As can be seen from following table, multiple SNPs on the same chromosome satisfy #the mQTL selection criteria. This is to be expected since SNPs that are close tend to be #in linkage disequilibrium. In our paper, we chose a single representative (highlighted in yellow) of every region. > datout chromo UCSC.Name Celera.Name NoSignifModuleGenes NoSignifGenes FisherP Enrich 49 1 p44629 rs3694065 34 119 2.5e-04 0.29 50 1 p44669 rs3717264 37 112 3.5e-06 0.33 51 1 p44901 rs3664662 37 111 3.1e-06 0.33 52 1 p45733 rs3658965 37 112 3.5e-06 0.33 53 1 p46102 rs3715814 37 121 3.0e-05 0.31 54 1 p45343 rs3725808 35 106 7.3e-06 0.33 55 1 p45507 rs3719973 34 98 2.7e-06 0.35 56 1 p45920 rs3659590 39 110 1.7e-07 0.35 57 1 p45182 rs3717377 39 110 1.7e-07 0.35 58 1 p45788 rs3677375 38 108 4.7e-07 0.35 59 1 p45339 rs3678662 38 108 4.7e-07 0.35 112 2 p45066 rs3655184 117 512 2.5e-06 0.23 113 2 p46356 rs3662347 117 512 2.5e-06 0.23 114 2 p45960 rs3707138 117 512 2.5e-06 0.23 188 3 p45967 rs4223971 30 64 2.6e-09 0.47 193 3 p45694 rs3715204 32 79 6.1e-08 0.41 194 3 p45241 rs3714575 54 110 5.2e-17 0.49 195 3 p46112 rs3717453 54 110 5.2e-17 0.49 196 3 p46111 rs3680433 54 110 5.2e-17 0.49 197 3 p46033 rs3671511 54 110 5.2e-17 0.49 198 3 p45546 rs3714671 55 115 1.2e-16 0.48 199 3 p44944 rs3705680 55 115 1.2e-16 0.48 200 3 p45317 rs3706170 33 73 1.3e-09 0.45 201 3 p45911 rs3707356 33 73 1.3e-09 0.45 202 3 p45620 rs3704498 33 73 1.3e-09 0.45 230 3 p46075 rs3698974 35 115 4.0e-05 0.30 231 3 p45544 rs3679962 35 115 4.0e-05 0.30 330 5 p46116 rs3688942 37 117 1.1e-05 0.32 331 5 p46327 rs3713492 37 117 1.1e-05 0.32 332 5 p45414 rs3713195 31 105 2.9e-04 0.30 333 5 p46249 rs3707624 34 115 1.1e-04 0.30 334 5 p45879 rs3679606 34 115 1.1e-04 0.30 335 5 p45249 rs3711950 36 118 4.3e-05 0.31 336 5 p44714 rs3656989 33 106 5.3e-05 0.31 337 5 p46273 rs3671517 35 122 1.8e-04 0.29 338 5 p44953 rs3684754 35 122 1.8e-04 0.29 339 5 p45322 rs3718145 37 126 7.7e-05 0.29 340 5 p46130 rs4138743 37 123 3.7e-05 0.30 341 5 p45840 rs3672514 37 126 7.7e-05 0.29 347 5 p44956 rs3721607 44 162 1.3e-04 0.27 348 5 p45839 rs3667334 44 162 1.3e-04 0.27 349 5 p44954 rs3688276 44 162 1.3e-04 0.27 350 5 p46184 rs3667067 44 162 1.3e-04 0.27 351 5 p45247 rs3686635 44 162 1.3e-04 0.27 352 5 p45750 rs3686672 44 162 1.3e-04 0.27 353 5 p45561 rs3716445 44 162 1.3e-04 0.27 612 10 p46016 rs3659700 34 76 1.0e-09 0.45 613 10 p46367 rs3660611 34 76 1.0e-09 0.45 614 10 p46411 rs3708275 39 99 5.3e-09 0.39 615 10 p46210 rs3673471 38 73 2.6e-13 0.52 616 10 p45930 rs3708450 40 80 3.4e-13 0.50 617 10 p46344 rs3693867 46 108 8.1e-12 0.43 618 10 p45651 rs3680457 42 92 4.3e-12 0.46 619 10 p44569 rs3677768 42 91 2.7e-12 0.46 620 10 p45928 rs3688363 54 91 4.1e-22 0.59 621 10 p45139 rs3679120 39 84 1.4e-11 0.46 8 622 623 625 626 627 628 629 630 631 632 633 637 638 705 706 707 708 709 710 711 712 713 714 715 722 723 724 725 726 727 728 729 730 731 732 733 756 766 767 768 769 824 825 826 827 828 829 830 831 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 10 10 10 10 10 10 10 10 10 10 10 10 10 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 14 14 14 14 14 14 14 14 19 19 19 19 19 19 19 19 19 19 19 19 p45511 p45653 p45025 p45347 p44731 p44673 p45023 p45797 p44526 p44560 p46293 p45026 p45863 p44633 p45364 p45517 p45659 p45588 p45144 p45363 p45362 p45464 p45143 p44635 p46266 p46225 p45126 p45194 p46054 p46227 p46219 p45358 p45937 p45356 p45802 p45196 p45801 p45369 p45438 p45664 p45523 p46296 p45807 p45207 p45439 p44590 p44591 p46384 p44919 p46074 p45826 p46355 p46124 p46123 p45229 p45500 p44890 p44892 p45916 p45915 p44811 rs3672342 rs3719045 rs3696307 rs3682060 rs3676909 rs3723140 rs3661840 rs3690226 rs3679452 rs3667599 rs3721306 rs3704401 rs3694833 rs3653990 rs4138084 rs3678128 rs3709002 rs3707458 rs3708624 rs3717933 rs3693390 rs3658504 rs3694821 rs3709287 rs3689251 rs3669704 rs3695382 rs3658204 rs3661834 rs3716646 rs3683481 rs3671194 rs3686437 rs3655309 rs3663333 rs3663603 rs3663172 rs3711424 rs4139503 rs3720707 rs3707097 rs3681670 rs3153444 rs3707560 rs3688833 rs3691821 rs3710549 rs3666026 rs3721918 rs3653886 rs3697994 rs3697139 rs3677115 rs3656289 rs3658160 mCV22979960 mCV22979964 mCV23068522 mCV23069037 mCV22608007 rs3722335 56 47 49 62 62 35 31 34 31 31 31 34 32 52 44 44 33 33 33 33 33 37 37 35 31 32 32 32 32 51 51 47 47 31 31 34 40 39 39 39 49 44 55 64 64 75 63 63 58 51 50 50 50 54 71 49 49 34 34 34 34 91 81 135 145 145 79 77 92 77 77 77 98 102 109 81 81 68 68 68 68 65 83 83 89 60 68 68 68 68 101 101 98 98 64 64 60 117 108 108 108 126 71 90 100 100 132 116 109 99 75 67 67 67 80 105 78 78 57 57 57 56 5.0e-24 8.8e-19 1.6e-09 1.0e-15 1.0e-15 7.9e-10 1.2e-07 4.6e-07 1.2e-07 1.2e-07 1.2e-07 2.7e-06 4.4e-05 9.7e-16 3.7e-16 3.7e-16 1.2e-10 1.2e-10 1.2e-10 1.2e-10 2.6e-11 2.0e-10 2.0e-10 3.5e-08 5.9e-11 6.5e-10 6.5e-10 6.5e-10 6.5e-10 8.8e-17 8.8e-17 2.0e-14 2.0e-14 4.9e-10 4.9e-10 1.9e-13 3.8e-07 9.9e-08 9.9e-08 9.9e-08 8.8e-11 3.2e-19 2.1e-23 7.8e-29 7.8e-29 8.0e-29 8.6e-23 8.2e-25 2.7e-23 6.1e-25 1.6e-27 1.6e-27 1.6e-27 4.0e-26 2.5e-34 1.1e-21 1.1e-21 2.3e-14 2.3e-14 2.3e-14 1.1e-14 0.62 0.58 0.36 0.43 0.43 0.44 0.40 0.37 0.40 0.40 0.40 0.35 0.31 0.48 0.54 0.54 0.49 0.49 0.49 0.49 0.51 0.45 0.45 0.39 0.52 0.47 0.47 0.47 0.47 0.50 0.50 0.48 0.48 0.48 0.48 0.57 0.34 0.36 0.36 0.36 0.39 0.62 0.61 0.64 0.64 0.57 0.54 0.58 0.59 0.68 0.75 0.75 0.75 0.68 0.68 0.63 0.63 0.60 0.60 0.60 0.61 Comment: Note that this analysis involving the 3421 genes suggests to define mQTLs for chromosomes 1 and 13 as well. This could lead to an integrated models (see tutorial IV) that provide an even better fit than the one reported in our paper. But here we will proceed with the mQTLs defined in our paper, i.e we proceed with the following SNPs 9 mQTLnames=c("rs3662347","rs3714671","rs3721607","rs3676909", "rs3704401", "rs3658504","rs3683481","rs3691821","rs3658160") # this vector indicates the locations of the mQTLS used in our paper LocationmQTL=is.element(Celera.Name, mQTLnames) # This vector assigns color red to mQTLs and black for other SNPs ColormQTL=ifelse(LocationmQTL,"red","black") 10 # Here we compare the number of module eQTLs to the the total number of eQTLs par(mfrow=c(1,1)) jitterNoSignifGenes=jitter(NoSignifGenes,amount=8) plot(NoSignifModuleGenes, jitterNoSignifGenes,xlab="No. Signif *Module* Genes at a SNP", ylab="No. Signif Genes at a SNP", type="n",cex=.75) text(NoSignifModuleGenes, jitterNoSignifGenes, labels=Celera.Chromosome ,col=ColormQTL,cex=.75) lm1=lm(NoSignifGenes~ poly( NoSignifModuleGenes,2)) x= seq(from=min(NoSignifModuleGenes),to= max(NoSignifModuleGenes) , length.out=1000) modelprediction = predict(lm1, data.frame(NoSignifModuleGenes=x)) lines(x, modelprediction) # The SNPs in the plot above are labelled by chromosome location. mQTLs are colored in red. 11 # Here we compare the number of module eQTLs to the the total number of eQTLs par(mfrow=c(1,1)) plot(NoSignifModuleGenes[LocationmQTL], NoSignifGenes[LocationmQTL],xlab="No. Signif *Module* Genes at a SNP", ylab="No. Signif Genes at a SNP", main="mQTL only", type="n") text(NoSignifModuleGenes[LocationmQTL], NoSignifGenes[LocationmQTL], col="red", labels= Celera.Chromosome[LocationmQTL]) lines(x, modelprediction) mQTL only 400 300 200 5 10 100 No. Signif Genes at a SNP 500 2 12 10 3 14 19 12 40 60 80 100 120 No. Signif *Module* Genes at a SNP 12 # As a pre-processing step for producing Figure 3, we create a position score (pos1) for #the x-axis. # the following code assumes that the chromosomes are sorted in order 1-19 # and that pos1 reports the marker distance on each chromosome pos1=pos chr1=as.numeric( chr) tickmarkposition1=rep(NA,length(unique(chr1))) chromonumber=1 tickmarkposition1[1]=pos1[chromonumber] for (i in c(2:(length(chr1)-1) ) ) { if (chr1[i] !=chr1[i-1] ) { chromonumber=chromonumber+1 gapbetweenchromosomes=30 pos1[i:length(chr1)]= pos1[i:length(chr1)]+pos1[i-1] –pos1[i]+ gapbetweenchromosomes tickmarkposition1[chromonumber]=pos1[i] } } tickmarkposition1 plot(pos1,col=chr1) # This plot shows that the position (pos1) of the SNP markers is increasing. No surprise here since this is how we defined pos1.It also shows a small vertical gap between markers at the end of "adjacenct" chromosomes. The markers have been colored by chromosome (1=black, 2=red, 3=green, 4=blue, etc) 13 # The following function is used to produce Figure 3 # whichchromosomes1=the chromosomes that should be used #normalize count=divide the number of significant genes by this factor if (exists("PlotQTLCountplusQTLbars" )) rm(PlotQTLCountplusQTLbars); PlotQTLCountplusQTLbars =function(NoSignificantGenes, chromosome,LODscore, MarkerLocation,NormalizationFactorCount=1, whichchromosomes1=c(1:19) ,color1="grey", LocationmQTL) { NormalizedCount= NoSignificantGenes/ NormalizationFactorCount rest1=is.element(chromosome, whichchromosomes1) ylim1=c(0, max( c(NormalizedCount, LODscore )) ) plot(MarkerLocation[rest1], NormalizedCount[rest1],ylim=ylim1, xlab="Mouse Chromosome",ylab=paste("LOD Score") , cex.axis=1.5,cex.lab=1.2, type="h", xaxt="n", yaxt="n",col=color1) text( c(MarkerLocation[rest1])[c( LocationmQTL[rest1])], 0, labels="*",cex=3,col="red" ) mtext(paste("No.Signif.Genes/", NormalizationFactorCount), side=4,cex=1.5) for (i in whichchromosomes1) { rest2=chromosome==i lines(MarkerLocation[rest2], LODscore[rest2],col="black",lwd=2) axis(2) # put numerical annotations at the tickmarks in y-axis; axis(1, tickmarkposition1[whichchromosomes1], labels=whichchromosomes1,cex.axis=1.5) } # end of for loop } # end of function PlotQTLCountplusQTL # Now we call the function on our count data par(mfrow=c(2,1),mar=c(4,4,2,1)+.1) PlotQTLCountplusQTLbars(NoSignificantGenes= NoSignifModuleGenes, chromosome=chr1,LODscore=lod, MarkerLocation= pos1, NormalizationFactorCount=20, whichchromosomes1=c(1:10),color1="blue", LocationmQTL= LocationmQTL ) PlotQTLCountplusQTLbars(NoSignificantGenes= NoSignifModuleGenes, chromosome=chr1,LODscore=lod, MarkerLocation= pos1, NormalizationFactorCount=20, whichchromosomes1=c(11:19) ,color1="blue", LocationmQTL= LocationmQTL) 14 # This is Figure 3 in our article. Blue Module mQTL Profile Partially Overlaps with the Physiological Trait (Weight) QTL The black curves represent the QTL analysis across the genome for mouse body weight using a single marker genome scan according to the LOD score scale to the right. Significant linkage for weight was found on Chromosome 1 (50.4 Mb, LOD = 4.07), Chromosome 5 (118.5 Mb, LOD = 3.39), Chromosome 15 (78.2 Mb, LOD = 3.58), and Chromosome 19 (50.4 Mb, LOD = 4.07). For each marker, the blue vertical line represents the number (divided by 20) of significant Blue module genes, i.e., the number of genes whose single-point LOD at this marker is bigger than 2. For example, there is a SNP marker on Chromosome 2 at which 117 Blue module genes have a LOD bigger than 2. The red stars denote the locations of the mQTLs reported in Table 1. 15 #The following code provides an alternative view of the same curves. if (exists("PlotQTLCountplusQTL" )) rm(PlotQTLCountplusQTL); PlotQTLCountplusQTL =function(NoSignificantGenes, chromosome,LODscore, MarkerLocation,NormalizationFactorCount=1, whichchromosomes1=c(1:19) ) { NormalizedCount= NoSignificantGenes/ NormalizationFactorCount rest1=is.element(chromosome, whichchromosomes1) ylim1=c(0, max( c(NormalizedCount, LODscore )) ) plot(MarkerLocation[rest1], NormalizedCount[rest1],ylim=ylim1,col=chromosome[rest1], xlab="Marker Location",ylab=paste("LOD or No.Signif.Genes/", NormalizationFactorCount) , cex.axis=1.5,cex.lab=1.2, type="n", xaxt="n", yaxt="n") text (MarkerLocation[rest1], NormalizedCount[rest1],label=chromosome[rest1],cex=.75,col=chromosome[rest1]) for (i in whichchromosomes1) { rest2=chromosome==i lines(MarkerLocation[rest2], LODscore[rest2],col=chromosome[rest2],lwd=2) axis(2) # put numerical annotations at the tickmarks in y-axis; axis(1, tickmarkposition1[whichchromosomes1], labels=whichchromosomes1,cex.axis=1.5) } # end of for loop } # end of function PlotQTLCountplusQTL par(mfrow=c(2,1),mar=c(3,4,2,1)+.1) PlotQTLCountplusQTL(NoSignificantGenes= NoSignifModuleGenes, chromosome=chr1,LODscore=lod, MarkerLocation= pos1, NormalizationFactorCount=20, whichchromosomes1=c(1:10) ) PlotQTLCountplusQTL(NoSignificantGenes= NoSignifModuleGenes, chromosome=chr1,LODscore=lod, MarkerLocation= pos1, NormalizationFactorCount=20, whichchromosomes1=c(11:19) ) 16 17 #Recall the LOD score curve for body weight plot(out.hk,gap=0,col="black") 18 This analysis shows that there are four QTLs with suggestive LOD scores for body weight on chromosomes 1,5,15,and 19. To see if there are interactions between these four QTLs one can use the fitqtl function in the qtl package. To know about the details of this function please visit http://www.biostat.jhsph.edu/~kbroman/qtl/ . In summary, this function allows one to fit linear models in which the marker locations are used as independent variables. In our example, we could use the position of the peak markers on chromosomes 1, 5, 15, and 19 as the independent variables. One could also use interaction terms between these loci to look for evidence of interaction between the corresponding QTLs. The output of the fitqtl function will give the p-values associated with each of the terms used in the model. This allows one to take the insignificant terms out and rerun the fitqtl function on the model with the significant terms. To run the fitqtl function on the full model use the following code: F2crossdata <- sim.geno(F2crossdata, step=2, n.draws=50) chr<-c(1,5,15,19) pos<-c(143,118,78,50) qtl<-makeqtl(F2crossdata,chr,pos) my.formula<-y~Q1+Q2+Q3+Q4+Q1*Q2+Q1*Q3+Q1*Q4+Q2*Q3+Q2*Q4+Q3*Q4 out.fitqtl <- fitqtl(F2crossdata$pheno[,1], qtl, formula=my.formula) summary(out.fitqtl) Summary for fit QTL Method is: imp Number of observations: 132 Full model result ---------------------------------Model formula is: y ~ Q1 + Q2 + Q3 + Q4 + Q1 * Q2 + Q1 * Q3 + Q1 * Q4 + Q2 * Q3 + Model formula is: Q2 * Q4 + Q3 * Q4 df SS MS LOD %var Pvalue(Chi2) Pvalue(F) Model 32 2592.209 81.00652 22.06625 53.69127 3.578178e-09 6.073403e-07 Error 99 2235.781 22.58364 Total 131 4827.989 Drop one QTL at a time ANOVA table: ---------------------------------df Type III SS LOD %var F value Pvalue(Chi2) Pvalue(F) Chr1@143 14 806.0401 8.8245 16.6951 2.5494 0.000203 0.00368 ** Chr5@118 14 398.7324 4.7039 8.2588 1.2611 0.086 0.24540 Chr15@78 14 813.9587 8.8990 16.8592 2.5744 0.000179 0.00337 ** Chr19@50 14 835.8985 9.1045 17.3136 2.6438 0.000127 0.00263 ** Chr1@143:Chr5@118 4 101.4772 1.2723 2.1019 1.1233 0.210 0.34991 Chr1@143:Chr15@78 4 334.1523 3.9925 6.9211 3.6991 0.001 0.00753 ** Chr1@143:Chr19@50 4 190.4327 2.3430 3.9443 2.1081 0.029 0.08544 . Chr5@118:Chr15@78 4 25.3292 0.3229 0.5246 0.2804 0.829 0.89005 Chr5@118:Chr19@50 4 116.2888 1.4534 2.4086 1.2873 0.153 0.28019 Chr15@78:Chr19@50 4 118.8034 1.4840 2.4607 1.3151 0.145 0.26963 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Note: from this we can conclude that there may be interactions between chromosome 1 and 15 (F statistic p=0.00753) and chromosomes 1 and 19 (F statistic p= 0.08544). Now 19 we go back and instead of using the full model we use the the model which has all the additive parameters and two interaction terms for chromosomes1 with 15 and 1 with 19. To do this copy and paste the following code into R: F2crossdata <- sim.geno(F2crossdata, step=2, n.draws=50) chr<-c(1,5,15,19) pos<-c(143,118,78,50) qtl<-makeqtl(F2crossdata,chr,pos) my.formula<-y~Q1+Q2+Q3+Q4+Q1*Q3+Q1*Q4 out.fitqtl <- fitqtl(F2crossdata$pheno[,1], qtl, formula=my.formula) summary(out.fitqtl) Summary for fit QTL Method is: imp Number of observations: 132 Full model result ---------------------------------Model formula is: y ~ Q1 + Q2 + Q3 + Q4 + Q1 * Q3 + Q1 * Q4 df SS MS LOD %var Pvalue(Chi2) Pvalue(F) Model 16 2178.838 136.17738 17.20353 45.12931 2.294350e-10 3.004679e-09 Error 115 2649.151 23.03610 Total 131 4827.989 Drop one QTL at a time ANOVA table: ---------------------------------df Type III SS LOD %var F Chr1@143 10 751.687 7.160 15.569 Chr5@118 2 161.825 1.700 3.352 Chr15@78 6 748.162 7.130 15.496 Chr19@50 6 662.310 6.396 13.718 Chr1@143:Chr15@78 4 374.601 3.791 7.759 Chr1@143:Chr19@50 4 245.341 2.539 5.082 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' value Pvalue(Chi2) Pvalue(F) 3.263 2.75e-04 0.000961 *** 3.512 0.020 0.033064 * 5.413 1.13e-05 5.85e-05 *** 4.792 4.99e-05 0.000213 *** 4.065 0.002 0.004046 ** 2.663 0.020 0.036070 * 0.05 '.' 0.1 ' ' 1 Note that 45% of the variation in weight phenotype can be explained by the variables of the model that only has the four loci detected using the scanone function and the two interaction terms detected by the fitqtl function. If one uses the model for which there are no interaction terms the model would only explain 34% of the weight variation as can be seen from the following output 20 F2crossdata <- sim.geno(F2crossdata, step=2, n.draws=50) chr<-c(1,5,15,19) pos<-c(143,118,78,50) qtl<-makeqtl(F2crossdata,chr,pos) my.formula<-y~Q1+Q2+Q3+Q4 out.fitqtl <- fitqtl(F2crossdata$pheno[,1], qtl, formula=my.formula) summary(out.fitqtl) Summary for fit QTL Method is: imp Number of observations: 132 Full model result ---------------------------------Model formula is: y ~ Q1 + Q2 + Q3 + Q4 df SS MS LOD %var Pvalue(Chi2) Pvalue(F) Model 8 1664.924 208.11546 12.12143 34.48483 3.056179e-09 9.729804e-09 Error 123 3163.066 25.71598 Total 131 4827.989 Drop one QTL at a time ANOVA table: ---------------------------------df Type III SS LOD %var F value Pvalue(Chi2) Pvalue(F) Chr1@143 2 224.329 1.964 4.646 4.362 0.011 0.01479 * Chr5@118 2 292.051 2.531 6.049 5.678 0.003 0.00438 ** Chr15@78 2 339.305 2.921 7.028 6.597 0.001 0.00190 ** Chr19@50 2 339.843 2.925 7.039 6.608 0.001 0.00188 ** THE END 21