Tutorial Gene-Screening Strategies Tova Fuller, Steve Horvath Correspondence: suprtova@ucla.edu, shorvath@mednet.ucla.edu Abstract Here we identify genes potentially involved in mouse obesity by using GSweight (absolute correlation of a gene with mouse body weight), GSSNP19 (absolute correlation of a gene with an mQTL on chromosome 19), and kME (module eigengene-based intramodular connectivity) to prioritize genes inside the blue module of a previously studied BxH F2 mouse intercross. This work is in press, and appears in Table 3 of: Tova Fuller, Anatole Ghazalpour, Jason Aten, Thomas A. Drake, Aldons J. Lusis, Steve Horvath (2007) Weighted gene coexpression network analysis strategies applied to mouse weight. Mamm Genome, in press. The data are described in: Anatole Ghazalpour, Sudheer Doss, Bin Zang, Susanna Wang,Eric E. Schadt, Thomas A. Drake, Aldons J. Lusis, Steve Horvath (2006) Integrating Genetics and Network Analysis to Characterize Genes Related to Mouse Weight. PloS Genetics This document and data files can be found at the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/DifferentialNetworkAn alysis More material on weighted network analysis can be found here: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ Method Description: The data are described in the PLoS article cited above. We utilize four criteria for choosing genes which each identify 8-9 candidate genes. We review the genes selected by each method, noting putative links to obesity-related syndromes such as hypertension, hypercholesterolemia and insulin resistance after both an initial gene ontology database search and a brief literature search. Below is an outline of the initial four criteria for choosing genes: Criteria 1: Strict GSweight thresholding (8 genes identified) o GSweight threshold at the 97.5%ile value o GSSNP19 threshold at the 75%ile value o kME threshold at the 75%ile value Criteria 2: Strict GSSNP19 thresholding (9 genes identified) 1 o GSweight threshold at the 75%ile value o GSSNP19 threshold at the 90%ile value o kME threshold at the 75%ile value Criteria 3: Strict kME thresholding (8 genes identified) o GSweight threshold at the 75%ile value o GSSNP19 threshold at the 75%ile value o kME threshold at the 95%ile value Criteria 4: Balanced thresholding (9 genes identified) – This corresponds with Table 3 of Fuller et al. o GSweight threshold at the 85%ile value o GSSNP19 threshold at the 85%ile value o kME threshold at the 85%ile value We create a fifth and final criteria that included all of the most relevant genes found in the previous four criteria. Criteria 5: Strict GSweight & kME thresholding, relaxed GSSNP19 thresholding (16 genes identified) o GSweight threshold at the 85%ile value o GSSNP19 threshold at the 75%ile value o kME threshold at the 85%ile value Appendix 1 contains tables of genes that meet each of these criteria. Appendix 2 contains gene ontology information for all genes recovered in these screening strategies. CODE TUTORIAL # In this tutorial, I will demonstrate how to determine the genes with the highest # significance for chromosome 19. GSSNP, the SNP based gene significance measure, is # the absolute value of the correlation between expression of a gene and SNP value (0, 1 # or 2). # Please adapt the following paths setwd("/Users/TovaFuller/Documents/HorvathLab2007/MouseProject2.0/GeneS earch/") source("/Users/TovaFuller/Documents/HorvathLab2006/NetworkFunctions/Net workFunctions.txt") # read in the R libraries. library(MASS) library(class) library(cluster) library(sma) library(impute) # library(faraway) model diagnostics # standard, no need to install # standard, no need to install # install it for the function plot.mat install it for imputing missing value # this library is useful for some of the linear 2 # Read in expression data related to the blue module dat1=read.csv("/Users/TovaFuller/Documents/HorvathLab2006/MouseTutorial s/Tutorial4/BluemoduleGenesWeightandSNPs.csv",header=T) # this data frame contains annotation and other information on the genes datSummary= data.frame(dat1[-c(1:10), c(1:8, 144:158)]) # this data frame contains the gene expression data (rows are samples, columns are # genes) datExprBlue=data.frame(t(dat1[-c(1:10), c(9:143)])) # This vector contains the contains the module color (blue) for each gene color1=rep("blue",dim(datExprBlue)[[2]] ) # This defines the module eigengene PC1=ModulePrinComps1(datExprBlue,color1)[[1]]$PCblue # This data frame contains the SNP markers of the mQTLs for the mice # Rows are mQTL SNP markers and columns are female mouse liver samples SNP= data.frame(dat1[1:9, c(9:143) ]) dimnames(SNP)[[1]]=as.character(dat1[1:9,1]) # body weight of each mouse weight=as.numeric(dat1[10, c(9:143) ]) # This defines the weight based gene significance measure GSweight=as.numeric(abs(cor(weight, datExprBlue,use="p"))) # This defines the SNP (mQTL) based gene significance measure GSSNP=data.frame(matrix(NA, nrow=dim(SNP)[[1]],ncol=dim(datExprBlue)[[2]] )) for (i in c(1:dim(SNP)[[1]]) ){GSSNP[i,]= as.numeric(abs(cor(as.numeric(SNP[i,]), datExprBlue,use="p")))} dimnames(GSSNP)[[1]]=paste("GS",as.character(dat1[1:9,1]),sep="") dimnames(GSSNP)[[2]]=paste(as.character(dat1[-c(1:10),1])) dim(GSSNP) GSSNP19 = GSSNP[9,] dimnames(GSSNP19)[[2]]=paste(as.character(dat1[-c(1:10),1])) # This defines the intramodular connectivity # Note that this assumes beta=6 used for the power adjacency function kIN=as.numeric(apply(abs(cor(datExprBlue,use="p"))^6,2,sum)) # This defines the module eigengene based connectivity measure. kME= as.numeric(abs(cor(PC1,datExprBlue,use="p"))) # Note that kME and kIN are highly correlated, which is always true for module genes. # See Horvath, Dong, Yip 2006 cor.test(kME,kIN) # Pearson's product-moment correlation # data: kME and kIN 3 # # # # # # # t = 38.4575, df = 533, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.8331551 0.8783076 sample estimates: cor 0.8573722 # There are different criteria we might use for choosing genes: 1. GSweight (the # correlation between expression and our phenotype of interest), 2. GSSNP (the # correlation between expression and SNP numerical value), 3. kME or kIN (our # connectivity measures). A priori there is no hard and fast mathematical precedent for # choosing genes based on these criteria. As such, we might try different methods, then # look to the literature available to date to determine which method selects the most # biologically relevant subgroup. # We could attempt to try 4 different methods: 1. stringent GSweight, 2. GSSNP, 3. # stringent connectivity and 4. a balanced approach. After finding genes with each # method, we will analyze each subgroup subjectively for biological relevance of genes # chosen. I have also added another method - a balanced approach with more relaxed # thresholds. # Now we wish to find the genes with highest GSSNP for mQTL19.0147. This is in row # 9 of the data frame GSSNP. quantile(probs=c(0.75,0.8, 0.825, 0.85,0.9,0.95,0.975), GSweight) # 75% 80% 82.5% 85% 90% 95% 97.5% # 0.4959823 0.5123367 0.5260120 0.5407020 0.5663083 0.6034069 0.6319880 quantile(GSSNP19) # Error in sort(x, partial = unique(c(lo, hi))) : # 'x' must be atomic # Because there was an error in finding the quantiles, I used the following work-around 535*0.25 # [1] 133.75 - For determining 75%ile, round down to 134 535*0.225 # [1] 120.325 - For determining 80%ile, round down to 120 535*0.15 # [1] 80.25 - For determining 85%ile, round down to 80 535*0.1 # [1] 53.5 - For determining 90%ile, round down to 53 quantile(probs=c(0.75,0.8,0.825,0.85,0.9,0.95), kME) # 75% 80% 82.5% 85% 90% 95% # 0.7111583 0.7245374 0.7347036 0.7514269 0.7745479 0.8024132 # Thresholding: # 1. Stringent GSweight cutoff: Let's choose the 97.5%ile for GSweight, the 75%ile for # GSSNP19 and the 75%ile for kME as preliminary thresholds. criteria1=GSweight > 0.631 & rank(-GSSNP19)<=134 & kME > 0.711 table(criteria1) # criteria1 4 # FALSE # 527 TRUE 8 # 2. Stringent GSSNP19 cutoff: Let's choose the 75%ile for GSweight, the 90%ile for # GSSNP19 and the 75%ile for kME as preliminary thresholds. criteria2=GSweight > 0.496 & rank(-GSSNP19)<=53 & kME > 0.711 table(criteria2) # criteria2 # FALSE TRUE # 526 9 # 3. Stringent kME cutoff: Let's choose the 75%ile for GSweight, the 75%ile for # GSSNP19 and the 95%ile for kME as preliminary thresholds. criteria3=GSweight > 0.496 & rank(-GSSNP19)<=134 & kME > 0.802 table(criteria3) # criteria3 # FALSE TRUE # 527 8 # 4. Balanced threshold: Let's choose the 85%ile for for all three variables. This # corresponds to Table 3 of Fuller, et al. criteria4=GSweight > 0.541 & rank(-GSSNP19)<=80 & kME > 0.751 table(criteria4) # criteria4 # FALSE TRUE # 526 9 # 5. Strict GSweight and kME threshold, relaxed GSSNP19 threshold: Let's choose a 85%ile threshold for for GSweight and kME, but a 75%ile for GSSNP19 threshold for variables. criteria5=GSweight > 0.541 & rank(-GSSNP19)<=134 & kME > 0.751 table(criteria5) # criteria5 # FALSE TRUE # 519 16 # We might be interested in the actual value of GSSNP19 that is a cutoff in this # circumstance. GSSNP19[rank(-GSSNP19)==134] # MMT00025527 # GSmQTL19.047 0.1939032 GSSNP19[rank(-GSSNP19)==135] # MMT00029178 # GSmQTL19.047 0.1927229 # So, the cutoff for GSSNP19 in this case is somewhere between 0.1927 and 0.1939. # Correlation between expression and trait (GSweight): signedGSweight=as.numeric(cor(weight, datExprBlue,use="p")) # Correlation between SNP and expression (GSSNP): 5 signedGSSNP=data.frame(matrix(NA, nrow=dim(SNP)[[1]],ncol=dim(datExprBlue)[[2]] )) for (i in c(1:dim(SNP)[[1]]) ){signedGSSNP[i,]= as.numeric(cor(as.numeric(SNP[i,]), datExprBlue,use="p"))} dimnames(signedGSSNP)[[1]]=paste("GS",as.character(dat1[1:9,1]),sep="") dimnames(signedGSSNP)[[2]]=paste(as.character(dat1[-c(1:10),1])) dim(signedGSSNP) signedGSSNP19 = signedGSSNP[9,] dimnames(signedGSSNP19)[[2]]=paste(as.character(dat1[-c(1:10),1])) # Correlation between SNP and trait (COR.weight): signedCOR.weight=data.frame(matrix(NA, nrow=dim(SNP)[[1]],ncol=length(weight))) for (i in c(1:dim(SNP)[[1]])){ signedCOR.weight[i,]= as.numeric(cor(as.numeric(SNP[i,]), weight,use="p"))} signedCOR.weight19=signedCOR.weight[9,] # Now we check to make sure that the sign of GSSNP*GSweight is the same as the sign # of COR.weight. # signedCOR.weight19 is positive MRgenes=signedGSSNP19[criteria5]*signedGSweight[criteria5]>0 table(MRgenes) MRgenes TRUE 16 # We create tables displaying the genes chosen for each criteria. # Criteria 1 Criteria1Col1= t(GSSNP19[criteria1]) Criteria1Col2= GSweight[criteria1] Criteria1Col3= kME[criteria1] Criteria1Col4 = kIN[criteria1] Criteria1Col5 = as.character(datSummary$genesymbol[criteria1]) Criteria1Col6 = as.character(datSummary$cytogeneticLoc[criteria1]) Criteria1Col7 = datSummary$CHROMOSOME[criteria1] Criteria1Table=data.frame(cbind(Criteria1Col1, Criteria1Col2, Criteria1Col3, Criteria1Col4, Criteria1Col5, Criteria1Col6, Criteria1Col7)) colnames(Criteria1Table)=c("GSmQTL19","GSweight","kME","kIN","Symbol"," Locus","Chr") write.csv(Criteria1Table,file="Criteria1Table.csv") # Criteria 2 Criteria2Col1= t(GSSNP19[criteria2]) Criteria2Col2= GSweight[criteria2] Criteria2Col3= kME[criteria2] Criteria2Col4 = kIN[criteria2] Criteria2Col5 = as.character(datSummary$genesymbol[criteria2]) Criteria2Col6 = as.character(datSummary$cytogeneticLoc[criteria2]) Criteria2Col7 = datSummary$CHROMOSOME[criteria2] Criteria2Table=data.frame(cbind(Criteria2Col1, Criteria2Col2, 6 Criteria2Col3, Criteria2Col4, Criteria2Col5, Criteria2Col6, Criteria2Col7)) colnames(Criteria2Table)=c("GSmQTL19","GSweight","kME","kIN","Symbol"," Locus","Chr") write.csv(Criteria2Table,file="Criteria2Table.csv") # Criteria 3 Criteria3Col1= t(GSSNP19[criteria3]) Criteria3Col2= GSweight[criteria3] Criteria3Col3= kME[criteria3] Criteria3Col4 = kIN[criteria3] Criteria3Col5 = as.character(datSummary$genesymbol[criteria3]) Criteria3Col6 = as.character(datSummary$cytogeneticLoc[criteria3]) Criteria3Col7 = datSummary$CHROMOSOME[criteria3] Criteria3Table=data.frame(cbind(Criteria3Col1, Criteria3Col2, Criteria3Col3, Criteria3Col4, Criteria3Col5, Criteria3Col6, Criteria3Col7)) colnames(Criteria3Table)=c("GSmQTL19","GSweight","kME","kIN","Symbol"," Locus","Chr") write.csv(Criteria3Table,file="Criteria3Table.csv") # Criteria 4 Criteria4Col1= t(GSSNP19[criteria4]) Criteria4Col2= GSweight[criteria4] Criteria4Col3= kME[criteria4] Criteria4Col4 = kIN[criteria4] Criteria4Col5 = as.character(datSummary$genesymbol[criteria4]) Criteria4Col6 = as.character(datSummary$cytogeneticLoc[criteria4]) Criteria4Col7 = datSummary$CHROMOSOME[criteria4] Criteria4Table=data.frame(cbind(Criteria4Col1, Criteria4Col2, Criteria4Col3, Criteria4Col4, Criteria4Col5, Criteria4Col6, Criteria4Col7)) colnames(Criteria4Table)=c("GSmQTL19","GSweight","kME","kIN","Symbol"," Locus","Chr") write.csv(Criteria4Table,file="Criteria4Table.csv") # Criteria 5 Criteria5Col1= t(GSSNP19[criteria5]) Criteria5Col2= GSweight[criteria5] Criteria5Col3= kME[criteria5] Criteria5Col4 = kIN[criteria5] Criteria5Col5 = as.character(datSummary$genesymbol[criteria5]) Criteria5Col6 = as.character(datSummary$cytogeneticLoc[criteria5]) Criteria5Col7 = datSummary$CHROMOSOME[criteria5] Criteria5Table=data.frame(cbind(Criteria5Col1, Criteria5Col2, Criteria5Col3, Criteria5Col4, Criteria5Col5, Criteria5Col6, Criteria5Col7)) colnames(Criteria5Table)=c("GSmQTL19","GSweight","kME","kIN","Symbol"," Locus","Chr") 7 write.csv(Criteria5Table,file="Criteria5Table.csv") # Code ends here 8 APPENDIX 1: Gene Tables Table 1: Criteria 1 - stringent GSweight threshold. Shaded cells in tables 1-4 demonstrate a gene is discovered by two or more criteria (excluding criteria 5). Symbol Anxa2 F7 Kng2 9430028I06Rik Slc43a1 Tubb2 Apom Avpr1a ID MMT00067823 MMT00078851 MMT00065159 MMT00078732 MMT00061313 MMT00006300 MMT00030931 MMT00031229 GSmQTL19 0.199332005 0.264531063 0.238095673 0.251091211 0.220163199 0.196766683 0.283209065 0.228304509 GSweight 0.649756971 0.667600913 0.657918826 0.677714137 0.684326146 0.65886256 0.683942018 0.63565915 kME 0.858735713 0.852072156 0.813741354 0.775352538 0.780452876 0.714639559 0.733975103 0.749624652 kIN 27.46522635 27.39089814 21.10660724 17.04628191 15.28572522 13.10651526 12.43188671 10.9795002 Locus 9_37.0_cM 8_7.0_cM 0 0 0 13_16.0_cM 0 0 Chr 9 8 16 3 2 13 17 10 kME 0.852072156 0.797604763 0.785751252 0.775352538 0.795856801 0.754988325 0.7709886 0.733975103 0.712211385 kIN 27.39089814 19.225064 18.99907753 17.04628191 17.01403746 15.47686791 15.33582603 12.43188671 10.86948446 Locus 8_7.0_cM Chr 8 16 4 3 14 6 13 17 13 Table 2: Criteria 2 - stringent GSSNP19 threshold Symbol F7 Pdir Slc30a2 9430028I06Rik Ang1 Fsp27 Gpld1 Apom C86987 ID MMT00078851 MMT00008463 MMT00071411 MMT00078732 MMT00064235 MMT00039459 MMT00016835 MMT00030931 MMT00018643 GSmQTL19 0.264531063 0.253024502 0.25034153 0.251091211 0.286826688 0.305643173 0.273250987 0.283209065 0.337407792 GSweight 0.667600913 0.617847121 0.584915878 0.677714137 0.605352022 0.612743555 0.543108251 0.683942018 0.547482326 0 4_65.7_cM 0 14_18.0_cM 0 13_13.0_cM 0 0 Table 3: Criteria 3 - stringent kME threshold Symbol Anxa2 F7 Anxa5 AI324046 Kng2 0 Msx2 Fetub ID MMT00067823 MMT00078851 MMT00056866 MMT00026028 MMT00065159 MMT00081689 MMT00028683 MMT00067079 GSmQTL19 0.199332005 0.264531063 0.219243631 0.196410814 0.238095673 0.210815145 0.204991147 0.195641933 GSweight 0.649756971 0.667600913 0.602986463 0.536962981 0.657918826 0.542111685 0.505218479 0.562950147 kME 0.858735713 0.852072156 0.840368035 0.820409646 0.813741354 0.814462318 0.803718296 0.811471942 kIN 27.46522635 27.39089814 25.12158728 22.5415714 21.10660724 20.06842564 18.81991381 17.66042261 Locus 9_37.0_cM 8_7.0_cM 3_19.2_cM 0 0 0 13_32.0_cM 0 Chr 9 8 3 12 16 12 13 16 Table 4: Criteria 4 - Balanced threshold Symbol F7 Kng2 Pdir Slc30a2 9430028I06Rik Ang1 Fsp27 Gpld1 Sh3d4 ID MMT00078851 MMT00065159 MMT00008463 MMT00071411 MMT00078732 MMT00064235 MMT00039459 MMT00016835 MMT00013759 GSmQTL19 0.264531063 0.238095673 0.253024502 0.25034153 0.251091211 0.286826688 0.305643173 0.273250987 0.237280683 GSweight 0.667600913 0.657918826 0.617847121 0.584915878 0.677714137 0.605352022 0.612743555 0.543108251 0.604388051 9 kME 0.852072156 0.813741354 0.797604763 0.785751252 0.775352538 0.795856801 0.754988325 0.7709886 0.788981054 kIN 27.39089814 21.10660724 19.225064 18.99907753 17.04628191 17.01403746 15.47686791 15.33582603 14.93406009 Locus 8_7.0_cM 0 0 4_65.7_cM 0 14_18.0_cM 0 13_13.0_cM 14_34.5_cM Chr 8 16 16 4 3 14 6 13 14 Table 5: Criteria 5 - stringent GSweight & kME thresholds, but relaxed GSSNP19 threshold. Genes that are in shaded boxes were elucidated by at least two of the first four criteria. Genes with names in red were found by only one of the first four criteria. Unshaded genes with names in black are new genes found by Criteria 5. Symbol Anxa2 F7 Anxa5 Kng2 0 Itih1 Pdir Slc30a2 Fetub 9430028I06Rik Ang1 Fsp27 Gpld1 Slc43a1 Sh3d4 Mat1a ID MMT00067823 MMT00078851 MMT00056866 MMT00065159 MMT00081689 MMT00081331 MMT00008463 MMT00071411 MMT00067079 MMT00078732 MMT00064235 MMT00039459 MMT00016835 MMT00061313 MMT00013759 MMT00013203 GSmQTL19 0.199332005 0.264531063 0.219243631 0.238095673 0.210815145 0.216393156 0.253024502 0.25034153 0.195641933 0.251091211 0.286826688 0.305643173 0.273250987 0.220163199 0.237280683 0.220571538 GSweight 0.649756971 0.667600913 0.602986463 0.657918826 0.542111685 0.583683222 0.617847121 0.584915878 0.562950147 0.677714137 0.605352022 0.612743555 0.543108251 0.684326146 0.604388051 0.556804531 10 kME 0.858735713 0.852072156 0.840368035 0.813741354 0.814462318 0.78349104 0.797604763 0.785751252 0.811471942 0.775352538 0.795856801 0.754988325 0.7709886 0.780452876 0.788981054 0.76276254 kIN 27.46522635 27.39089814 25.12158728 21.10660724 20.06842564 19.31456836 19.225064 18.99907753 17.66042261 17.04628191 17.01403746 15.47686791 15.33582603 15.28572522 14.93406009 14.1241317 Locus 9_37.0_cM 8_7.0_cM 3_19.2_cM 0 0 0 0 4_65.7_cM 0 0 14_18.0_cM 0 13_13.0_cM 0 14_34.5_cM 0 Chr 9 8 3 16 12 14 16 4 16 3 14 6 13 2 14 14 APPENDIX 2 – Gene Ontology Information All information below was retrieved from Gene Ontology Classifications on the Mouse Genomics Informatics website [i]. Criteria 1 - Stringent GSweight Anxa2 (annexin A2): processes: angiogenesis, collagen fibril organization, fibrinolysis function: calcium ion binding, calcium-dependent phosphlipid binding, cytoskeletal protein binding, phospholipase inhibitor activity, protein binding F7 (coagulation factor 7): processes: blood coagulation, metabolism, proteolysis function: calcium ion binding, coagulation factor VIIa activity, hydrolase activity, oxidoreductase activity, peptidase activity, serine-type endopeptidase activity Kng2 (kininogen 2): no ontology data available on this website 9430028I06Rik (Lrrc39, leucine rich repeat containing 39): function: transferase activity Slc43a1 (solute carrier family 43, member 1): processes: amino acid transport, L-amino acid transport, transport function: amino acid transporter activity, L-amino acid transporter activity Tubb2 (tubulin, beta) - Note: this name as ambiguous as it should be either Tubb2a or Tubb2b. Gene ontology information below is derived from Tubb2a listings. processes: microtubule-based process function: GTP binding, nucleotide binding, structural constituent of cytoskeleton Apom (apolipoprotein M) processes: lipid transport, transport function: binding, lipid transporter activity Avpr1a (arginine vasopressin receptor 1a) processes: G-protein coupled receptor protein signaling pathway, signal transduction function: G-protein coupled receptor activity, receptor activity, rhodopsin-like receptor activity, signal transducer activity, vasopressin receptor activity 11 Criteria 2 - Stringent GSSNP19 F7 (coagulation factor 7): processes: blood coagulation, metabolism, proteolysis function: calcium ion binding, coagulation factor VIIa activity, hydrolase activity, oxidoreductase activity, peptidase activity, serine-type endopeptidase activity Pdir (not found) Slc30a2 (solute carrier family 30 - zinc transporter, member 2): processes: biological process unknown function: molecular function unknown 9430028I06Rik (Lrrc39, leucine rich repeat containing 39): processes: none available function: transferase activity Ang1 (angiogenin, ribonuclease A family, member 1) processes: angiogenesis, cell differentiation, development, negative regulation of protein biosynthesis function: endonuclease activity, hydrolase activity, nuclease activity, nucleic acid binding, pancreatic ribonuclease activity Fsp27 (aka Cidec, cell death-inducing DFFA-like effector c) processes: apoptosis, induction of apoptosis function: protein binding Gpld1 (glycosylphosphatidylinositol specific phospholipase) processes: GPI anchor release function: glycosylphosphatidylinositol phospholipase D activity, hydrolase activity, lipid transporter activity, phospholipase D activity Apom (apolipoprotein M) processes: lipid transport, transport function: binding, lipid transporter activity C86987 (aka Ung2, uracil DNA glycosylase 2) - no ontology data available on this website 12 Criteria 3 - Stringent kME Anxa2 (annexin A2): processes: angiogenesis, collagen fibril organization, fibrinolysis function: calcium ion binding, calcium-dependent phosphlipid binding, cytoskeletal protein binding, phospholipase inhibitor activity, protein binding F7 (coagulation factor 7): processes: blood coagulation, metabolism, proteolysis function: calcium ion binding, coagulation factor VIIa activity, hydrolase activity, oxidoreductase activity, peptidase activity, serine-type endopeptidase activity Anxa5 (annexin A5): processes: blood coagulation, negative regulation of coagulation function: calcium ion binding, calcium-dependent phospholipid binding AI324046: processes: none available function: antigen binding Kng2 (kininogen 2): no ontology data available on this website Msx2 (homeo box, msh-like 2): processes: development, embryonic limb morphogenesis, regulation of transcription, regulation of transcription, DNA-dependent. function: DNA binding, protein binding, sequence-specific DNA binding, transcription factor activity Fetub (fetuin beta) processes: none available function: cysteine protease inhibitor activity 13 Criteria 4 - Balanced thresholds F7 (coagulation factor 7): processes: blood coagulation, metabolism, proteolysis function: calcium ion binding, coagulation factor VIIa activity, hydrolase activity, oxidoreductase activity, peptidase activity, serine-type endopeptidase activity Kng2 (kininogen 2): no ontology data available on this website Pdir (not found) Slc30a2 (solute carrier family 30 - zinc transporter, member 2): processes: biological process unknown function: molecular function unknown 9430028I06Rik (Lrrc39, leucine rich repeat containing 39): function: transferase activity Ang1 (angiogenin, ribonuclease A family, member 1) processes: angiogenesis, cell differentiation, development, negative regulation of protein biosynthesis function: endonuclease activity, hydrolase activity, nuclease activity, nucleic acid binding, pancreatic ribonuclease activity Fsp27 (aka Cidec, cell death-inducing DFFA-like effector c) processes: apoptosis, induction of apoptosis function: protein binding Gpld1 (glycosylphosphatidylinositol specific phospholipase) processes: GPI anchor release function: glycosylphosphatidylinositol phospholipase D activity, hydrolase activity, lipid transporter activity, phospholipase D activity Sh3d4 (aka sorbin and SH3 domain containing 3) processes: cell adhesion, cell-substrate adhesion, negative regulation of transcription from RNA polymerase II promoter, positive regulation of MAPKKK cascade, transport function: protein binding, transcription factor binding 14 Criteria 4 - Stringent GSweight and kME thresholds, relaxed GSSNP19 threshold Anxa2 (annexin A2): processes: angiogenesis, collagen fibril organization, fibrinolysis function: calcium ion binding, calcium-dependent phosphlipid binding, cytoskeletal protein binding, phospholipase inhibitor activity, protein binding F7 (coagulation factor 7): processes: blood coagulation, metabolism, proteolysis function: calcium ion binding, coagulation factor VIIa activity, hydrolase activity, oxidoreductase activity, peptidase activity, serine-type endopeptidase activity Anxa5 (annexin A5): processes: blood coagulation, negative regulation of coagulation function: calcium ion binding, calcium-dependent phospholipid binding Kng2 (kininogen 2): no ontology data available on this website Itih1 (inter-alpha (globulin) inhibitor, H1 polypeptide, Intin1, Itih-1) processes: hyaluronan metabolism function: copper ion binding, endopeptidase inhibitor activity, serine-type endopeptidase inhibitor activity Pdir (not found) Slc30a2 (solute carrier family 30 - zinc transporter, member 2): processes: biological process unknown function: molecular function unknown Fetub (fetuin beta) processes: none available function: cysteine protease inhibitor activity 9430028I06Rik (Lrrc39, leucine rich repeat containing 39): processes: none available function: transferase activity Ang1 (angiogenin, ribonuclease A family, member 1) processes: angiogenesis, cell differentiation, development, negative regulation of protein biosynthesis function: endonuclease activity, hydrolase activity, nuclease activity, nucleic acid binding, pancreatic ribonuclease activity Fsp27 (aka Cidec, cell death-inducing DFFA-like effector c) 15 processes: apoptosis, induction of apoptosis function: protein binding Gpld1 (glycosylphosphatidylinositol specific phospholipase) processes: GPI anchor release function: glycosylphosphatidylinositol phospholipase D activity, hydrolase activity, lipid transporter activity, phospholipase D activity Slc43a1 (solute carrier family 43, member 1): processes: amino acid transport, L-amino acid transport, transport function: amino acid transporter activity, L-amino acid transporter activity Sh3d4 (aka sorbin and SH3 domain containing 3) processes: cell adhesion, cell-substrate adhesion, negative regulation of transcription from RNA polymerase II promoter, positive regulation of MAPKKK cascade, transport function: protein binding, transcription factor binding Mat1a (methionine adenosyltransferase I, alpha) processes: one-carbon compound metabolism function: ATP binding, magnesium ion binding, metal ion binding, methionine adenosyltransferase activity, nucleotide binding, potassium ion binding, transferase activity i Mouse Genomics Informatics [url= http://www.informatics.jax.org/, last accessed 9/25/06]. 16