ASSOCIATION ANALYSIS OF GALECTIN GENE PROMOTER POLYMORPHISMS WITH MULTIPLE CANCERS Dean Wu B.S., University of California, Santa Cruz, 2004 THESIS Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in BIOLOGICAL SCIENCES (Molecular and Cellular Biology) at CALIFORNIA STATE UNIVERSITY, SACRAMENTO SPRING 2011 ASSOCIATION ANALYSIS OF GALECTIN GENE PROMOTER POLYMORPHISMS WITH MULTIPLE CANCERS A Thesis by Dean Wu Approved by: __________________________________, Committee Chair Nicholas N. Ewing, Ph.D. __________________________________, Second Reader Brett Holland, Ph.D. __________________________________, Third Reader Fu-Tong Liu, M.D., Ph.D. Date: ____________________________ ii Student: Dean Wu I certify that this student has met the requirements for format contained in the University format manual, and that this thesis is suitable for shelving in the Library and credit is to be awarded for the thesis. __________________________, Graduate Coordinator Susanne Lindgren, Ph.D. Department of Biological Sciences iii ___________________ Date Abstract of ASSOCIATION ANALYSIS OF GALECTIN GENE PROMOTER POLYMORPHISMS WITH MULTIPLE CANCERS by Dean Wu Galectins are a family of carbohydrate-binding proteins with diverse functions in a wide range of cellular processes. A number of galectins play roles in tumorigenesis and cancer progression, and altered expression of various galectins has been observed in numerous malignancies. As promoter polymorphisms have been linked to expression differences, the goal of this thesis was to utilize public genome databases in order to locate single nucleotide polymorphisms (SNPs) in human galectin promoters, particularly those overlapping putative transcription factor and methylation sites, and investigate any possible association with cancer susceptibility. It was hypothesized that galectin promoter SNPs overlapping methylation and transcription factor binding sites may be associated with various cancers exhibiting iv altered galectin expression. In order to test this hypothesis, the objectives of this project are summarized as follows: 1: Parse the Oncomine database for expression differences in galectins 1-4 and 7-10 between cancerous and normal tissue. 2: Locate SNPs in upstream regulatory regions of genes coding for the above galectins using the International HapMap Project database. 3: Screen SNP sequences for putative overlapping transcription factor binding sites in upstream regulatory regions using fSNP, Consite, and TESS (TRANSFAC and IMD) search platforms. 4: Locate individual CpG sites overlaying upstream SNPs. Search for known CpG islands coinciding with the SNP sites using the UCSC Genome Browser and inspect sequences for putative CpG islands using CpG Island Searcher, CpG Plot, and CpG Island Explorer. 5: Screen Illumina whole-genome SNP data archived at the Gene Expression Omnibus for association between presence of galectin promoter SNPs and cancers showing significant galectin expression differences. v A possible association between the rs3763959 polymorphism upstream of the galectin-9 start site and human breast carcinoma was observed, along with a more tenuous but still statistically significant association of the galectin-1 upstream polymorphism rs4820294 with a single melanoma dataset. _______________________, Committee Chair Nicholas N. Ewing, Ph.D. vi DEDICATION This thesis is dedicated to the anonymous study participants – many of whom succumbed to their cancers – who provided tissue samples for the genotyping projects that contributed data to the databases this thesis drew so heavily from. vii ACKNOWLEDGMENTS First, I would like to thank Dr. Fu-Tong Liu for his great generosity in allowing me the use of his laboratory facilities. The Liu lab members have offered valuable insights, particularly Dr. Daniel Hsu who first introduced me to Oncomine, GEO, and other online databases. I would also like to thank the other members of my committee, Dr. Nicholas Ewing and Dr. Brett Holland, for their indispensable help and support. Finally, this work would not be possible without the many researchers cited herein who made the data from their projects publicly available. viii TABLE OF CONTENTS Page Dedication .................................................................................................................. vii Acknowledgments..................................................................................................... viii List of Tables ............................................................................................................ xix List of Figures ............................................................................................................. xx INTRODUCTION ........................................................................................................ 1 Overview of the galectin family ....................................................................... 1 Galectin structure and carbohydrate binding .................................................... 2 Intracellular functions of galectins.................................................................... 7 Extracellular functions of galectins .................................................................. 8 Galectin-modulated immunoregulation .......................................................... 11 Galectins and cancer ....................................................................................... 14 Transcriptional regulation of galectin genes ................................................... 16 OBJECTIVES ............................................................................................................. 19 MATERIALS AND METHODS ................................................................................ 20 Oncomine ........................................................................................................ 20 SNP selection using the HapMap SNP database ............................................ 21 Transcription factor binding site location ....................................................... 22 Location of CpG sites and islands .................................................................. 23 Genomic data and association analysis ........................................................... 24 RESULTS ................................................................................................................... 28 ix Differential expression of galectins in Oncomine cancers ............................. 28 Promoter SNPs located on HapMap database ................................................ 31 Multiple predicted transcription factor binding sites ...................................... 33 Putative methylation sites coinciding with galectin promoter SNPs .............. 35 Significant genotype and allele frequency differences uncovered ................. 37 DISCUSSION ............................................................................................................. 41 Galectin-9 and breast cancer ........................................................................... 42 E2F binding site at rs3763959 ........................................................................ 43 Methylation at rs3763959 ............................................................................... 43 Conclusions and future directions ................................................................... 44 APPENDICES ............................................................................................................ 47 Appendix A. Oncomine results ................................................................................. 48 1. List of individual Oncomine datasets cited ................................................. 48 2. Normal vs. cancer, LGALS1 nonlymphoid/noncolon ................................ 51 a. Expression differences, p-value threshold of 1E-2 ......................... 51 b. Individual studies, expression difference significant at 0.9 lognormalized expression units ............................................................ 53 3. Normal vs. cancer, LGALS2 nonlymphoid/noncolon ................................ 59 a. Expression differences, p-value threshold of 1E-2 ......................... 59 b. Individual studies, expression difference significant at 0.9 lognormalized expression units ............................................................ 61 4. Normal vs. cancer, LGALS3 nonlymphoid/noncolon ................................ 62 x a. Expression differences, p-value threshold of 1E-2 ......................... 62 b. Individual studies, expression difference significant at 0.9 lognormalized expression units .............................................................64 5. Normal vs. cancer, LGALS4 nonlymphoid/noncolon ................................ 71 a. Expression differences, p-value threshold of 1E-2 ......................... 71 b. Individual studies, expression difference significant at 0.9 lognormalized expression units ............................................................ 73 6. Normal vs. cancer, LGALS7 nonlymphoid/noncolon ................................ 80 a. Expression differences, p-value threshold of 1E-2 ......................... 80 b. Individual studies, expression difference significant at 0.9 lognormalized expression units ............................................................ 82 7. Normal vs. cancer, LGALS8 nonlymphoid/noncolon ................................ 90 a. Expression differences, p-value threshold of 1E-2 ......................... 90 b. Individual studies, expression difference significant at 0.9 lognormalized expression units ............................................................ 92 8. Normal vs. cancer, LGALS9 nonlymphoid/noncolon ................................ 99 a. Expression differences, p-value threshold of 1E-2 ......................... 99 b. Individual studies, expression difference significant at 0.9 lognormalized expression units .......................................................... 101 9. Normal vs. cancer, CLC/LGALS10 nonlymphoid/noncolon ................... 104 a. Expression differences, p-value threshold of 1E-2 ....................... 104 b. Individual studies, expression difference significant at 0.9 lognormalized expression units .......................................................... 106 Appendix B. Association analysis, genotype frequencies ...................................... 110 1. rs428007 (CLC/LGALS10) ..................................................................... 111 xi a. GSE16019: Chen M et al. (2009) ................................................. 111 b. GSE13282: Gordan et al. (2008) ................................................. 112 c. GSE19189: Letouze et al. (2010) ................................................. 113 d. GSE10506: Nancarrow et al. (2008) ............................................ 114 e. GSE18799: Popova et al. (2009) .................................................. 115 f. GSE9003: Stark & Hayward (2007) ............................................. 116 g. GSE19177: Waddell et al. (2010) ................................................ 117 2. rs929039 (LGALS1) ................................................................................ 118 a. GSE16019: Chen M et al. (2009) ................................................. 118 b. GSE13282: Gordan et al. (2008) ................................................. 119 c. GSE19189: Letouze et al. (2010) ................................................. 120 d. GSE10506: Nancarrow et al. (2008) ............................................ 121 e. GSE18799: Popova et al. (2009) .................................................. 122 f. GSE9003: Stark & Hayward (2007) ............................................. 123 g. GSE19177: Waddell et al. (2010) ................................................ 124 3. rs2235338 (LGALS2) .............................................................................. 125 a. GSE21168: Castillo et al. (2010) ................................................. 125 b. GSE16019: Chen M et al. (2009) ................................................ 126 c. GSE13282: Gordan et al. (2008) .................................................. 127 d. GSE19189: Letouze et al. (2010) ................................................. 128 e. GSE10506: Nancarrow et al. (2008) ............................................ 129 xii f. GSE18799: Popova et al. (2009) .................................................. 130 g. GSE9003: Stark & Hayward (2007) ............................................ 131 h. GSE19177: Waddell et al. (2010) ................................................ 132 4. rs3763959 (LGALS9) .............................................................................. 133 a. GSE21168: Castillo et al. (2010) ................................................. 133 b. GSE16019: Chen M et al. (2009) ................................................ 134 c. GSE13282: Gordan et al. (2008) .................................................. 135 d. GSE19189: Letouze et al. (2010) ................................................. 136 e. GSE10506: Nancarrow et al. (2008) ............................................ 137 f. GSE18799: Popova et al. (2009) .................................................. 138 g. GSE9003: Stark & Hayward (2007) ............................................ 139 h. GSE19177: Waddell et al. (2010) ................................................ 140 5. rs4820294 (LGALS1) .............................................................................. 141 a. GSE21168: Castillo et al. (2010) ................................................. 141 b. GSE16019: Chen M et al. (2009) ................................................ 142 c. GSE13282: Gordan et al. (2008) .................................................. 143 d. GSE19189: Letouze et al. (2010) ................................................. 144 e. GSE10506: Nancarrow et al. (2008) ............................................ 145 f. GSE18799: Popova et al. (2009) .................................................. 146 g. GSE9003: Stark & Hayward (2007) ............................................ 147 h. GSE19177: Waddell et al. (2010) ................................................ 148 6. rs10403583 (LGALS4) ............................................................................ 149 xiii a. GSE21168: Castillo et al. (2010) ................................................. 149 b. GSE16019: Chen M et al. (2009) ................................................ 150 c. GSE13282: Gordan et al. (2008) .................................................. 151 d. GSE19189: Letouze et al. (2010) ................................................. 152 e. GSE10506: Nancarrow et al. (2008) ............................................ 153 f. GSE18799: Popova et al. (2009) .................................................. 154 g. GSE9003: Stark & Hayward (2007) ............................................ 155 h. GSE19177: Waddell et al. (2010) ................................................ 156 7. rs10489789 (LGALS8) ............................................................................ 157 a. GSE16019: Chen M et al. (2009) ................................................. 157 b. GSE13282: Gordan et al. (2008) ................................................. 158 c. GSE19189: Letouze et al. (2010) ................................................. 159 d. GSE10506: Nancarrow et al. (2008) ............................................ 160 e. GSE18799: Popova et al. (2009) .................................................. 161 f. GSE9003: Stark & Hayward (2007) ............................................. 162 g. GSE19177: Waddell et al. (2010) ................................................ 163 Appendix C. Association analysis, allele frequencies ............................................ 164 1. rs428007 (CLC/LGALS10) ..................................................................... 165 a. GSE16019: Chen M et al. (2009) ................................................. 165 b. GSE13282: Gordan et al. (2008) ................................................. 166 c. GSE19189: Letouze et al. (2010) ................................................. 167 xiv d. GSE10506: Nancarrow et al. (2008) ............................................ 168 e. GSE18799: Popova et al. (2009) .................................................. 169 f. GSE9003: Stark & Hayward (2007) ............................................. 170 g. GSE19177: Waddell et al. (2010) ................................................ 171 2. rs929039 (LGALS1) ................................................................................ 172 a. GSE16019: Chen M et al. (2009) ................................................. 172 b. GSE13282: Gordan et al. (2008) ................................................. 173 c. GSE19189: Letouze et al. (2010) ................................................. 174 d. GSE10506: Nancarrow et al. (2008) ............................................ 175 e. GSE18799: Popova et al. (2009) .................................................. 176 f. GSE9003: Stark & Hayward (2007) ............................................. 177 g. GSE19177: Waddell et al. (2010) ................................................ 178 3. rs2235338 (LGALS2) .............................................................................. 179 a. GSE21168: Castillo et al. (2010) ................................................. 179 b. GSE16019: Chen M et al. (2009) ................................................ 180 c. GSE13282: Gordan et al. (2008) .................................................. 181 d. GSE19189: Letouze et al. (2010) ................................................. 182 e. GSE10506: Nancarrow et al. (2008) ............................................ 183 f. GSE18799: Popova et al. (2009) .................................................. 184 g. GSE9003: Stark & Hayward (2007) ............................................ 185 h. GSE19177: Waddell et al. (2010) ................................................ 186 4. rs3763959 (LGALS9) .............................................................................. 187 xv a. GSE21168: Castillo et al. (2010) ................................................. 187 b. GSE16019: Chen M et al. (2009) ................................................ 188 c. GSE13282: Gordan et al. (2008) .................................................. 189 d. GSE19189: Letouze et al. (2010) ................................................. 190 e. GSE10506: Nancarrow et al. (2008) ............................................ 191 f. GSE18799: Popova et al. (2009) .................................................. 192 g. GSE9003: Stark & Hayward (2007) ............................................ 193 h. GSE19177: Waddell et al. (2010) ................................................ 194 5. rs4820294 (LGALS1) .............................................................................. 195 a. GSE21168: Castillo et al. (2010) ................................................. 195 b. GSE16019: Chen M et al. (2009) ................................................ 196 c. GSE13282: Gordan et al. (2008) .................................................. 197 d. GSE19189: Letouze et al. (2010) ................................................. 198 e. GSE10506: Nancarrow et al. (2008) ............................................ 199 f. GSE18799: Popova et al. (2009) .................................................. 200 g. GSE9003: Stark & Hayward (2007) ............................................ 201 h. GSE19177: Waddell et al. (2010) ................................................ 202 6. rs10403583 (LGALS4) ............................................................................ 203 a. GSE21168: Castillo et al. (2010) ................................................. 203 b. GSE16019: Chen M et al. (2009) ................................................ 204 c. GSE13282: Gordan et al. (2008) .................................................. 205 xvi d. GSE19189: Letouze et al. (2010) ................................................. 206 e. GSE10506: Nancarrow et al. (2008) ............................................ 207 f. GSE18799: Popova et al. (2009) .................................................. 208 g. GSE9003: Stark & Hayward (2007) ............................................ 209 h. GSE19177: Waddell et al. (2010) ................................................ 210 7. rs10489789 (LGALS8) ............................................................................ 211 a. GSE16019: Chen M et al. (2009) ................................................. 211 b. GSE13282: Gordan et al. (2008) ................................................. 212 c. GSE19189: Letouze et al. (2010) ................................................. 213 d. GSE10506: Nancarrow et al. (2008) ............................................ 214 e. GSE18799: Popova et al. (2009) .................................................. 215 f. GSE9003: Stark & Hayward (2007) ............................................. 216 g. GSE19177: Waddell et al. (2010) ................................................ 217 Appendix D. SNP genotypes in the GSE11976 dataset .......................................... 218 Appendix E. G-test results for Hardy-Weinburg equilibrium ................................. 219 1. rs428007 (CLC/LGALS10) ..................................................................... 219 2. rs929039 (LGALS1) ................................................................................ 220 3. rs2235338 (LGALS2) .............................................................................. 221 4. rs3763959 (LGALS9) .............................................................................. 222 5. rs4820294 (LGALS1) .............................................................................. 223 6. rs10403583 (LGALS4) ............................................................................ 224 7. rs10489789 (LGALS8) ............................................................................. 225 xvii Literature Cited ......................................................................................................... 226 xviii LIST OF TABLES Page Table 1. Relevant Illumina cancer SNP datasets archived at the Gene Expression Omnibus ........................................................................................................ 26 Table 2. Differential expression of galectins in nonlymphatic tissue (cancer vs. normal), from Oncomine ................................................................................29 Table 3. Illumina BeadArray SNPs in upstream promoter regions of galectin genes .............................................................................................................. 32 Table 4. Predicted transcription factor binding sites overlapping with SNPs (both reference and nonreference alleles) in upstream promoter regions of galectin genes ................................................................................................ 34 Table 5. Individual CpG sites and putative CpG islands at SNP locations ................ 36 Table 6. SNPs exhibiting significant genotype and/or allele frequency differences in study datasets when compared to HapMap CEU population only............ 39 xix LIST OF FIGURES Page Figure 1. The carbohydrate-binding domain (CRD) of galectin-3 displayed in 3D Mol-Viewer, Vector NTI Advance 10.3.1 demo mode (©2007 Invitrogen Corporation, http://www.invitrogen.com); shown with ligand (dark structure) at upper right (PDB ID: 2nmo, from Collins et al., 2007) ... 3 Figure 2. Structural categorization of galectins showing prototype (1, 2, 5, 7, 10, 11, 13, 14, 15, 16, 17), tandem repeat (4, 6, 8, 9, 12), and chimeric (3) galectins ......................................................................................................... 5 Figure 3. Galectin-3 molecules can associate with each other at the N-terminus (left), leading to pentamer formation (right). ................................................. 6 Figure 4. Homodimers of prototype galectins (light circles) bind to glycans (black crosses) to form lattices. ................................................................................ 9 Figure 5. Tandem-repeat galectins containing two different CRDs (light and dark circles, connected by linker regions) can form lattice structures with bivalent ligands displaying distinct saccharide groups, shown here as crosses with differently-colored arms. ......................................................... 10 xx 1 INTRODUCTION Overview of the galectin family The galectins are a family of carbohydrate-binding proteins distinguished by the criteria of binding to -galactosides (Massa et al., 1993) and/or the presence of characteristic conserved sequence elements in the carbohydrate-binding domain (CRD) (Liu, 2000; Nakahara & Raz, 2006). This protein family is believed to be ancient, and is present in chordates, nematodes, insects, sponges, and fungi with the exception of budding yeast (Boulianne et al., 2000). Houzelstein et al. (2008) hypothesized that a series of multiple gene duplication events resulted in the divergence of vertebrate galectins. The first galectin to be described was found by Teichberg et al. (1975) who isolated electrolectin from the electric organ of the gymnotid electric "eel" Electrophorus electricus. It was not until additional galectins had been discovered (Barondes et al., 1994a) that the term "galectin" was adopted to describe this protein family. Granulocytic bodies containing Charcot-Leyden crystal protein (CLC, also known as galectin-10) were discovered by Jean-Martin Charcot in the 19th century, although the carbohydratebinding properties of CLC were not noted until much later (Leonidas et al., 1995). Galectins are synthesized on free ribosomes (Rabinovich et al., 2002) and lack a transmembrane domain or a classical secretory signal sequence (Barondes et al., 1994b; Liu, 2000), although they are secreted across the plasma membrane and intracellular membranes (Nakahara et al., 2006; Ochieng et al., 2004) in addition to localizing in the cytoplasm. Because galectins are able to recognize and bind cell surface glycoproteins 2 and glycolipids (Guévremont et al., 2004; Yang et al., 2008), extracellular galectins can induce cross-linking of surface glycoproteins on adjacent cells of the same or different types (Brewer et al., 2002; Lagana et al., 2006) and thereby trigger transmembrane signaling cascades (Stillmann et al., 2006; He & Baum, 2006; Camby et al., 2006). In addition to their extracellular functions, galectins are known to operate in a variety of intracellular regulatory networks (Liu et al., 2002). Involvement of extracellular galectin cross-linking has further been proposed in host-pathogen interactions (Rabinovich & Toscano, 2009). Galectin structure and carbohydrate binding The carbohydrate-binding sites of galectin CRDs (Figure 1) are able to accommodate galactoside saccharides of varying composition, although each galectin exhibits fine specificities for certain saccharides (Barondes et al., 1988; Nakahara et al., 2005). Three consecutive exons code for the CRD in known mammalian galectin genes, with the middle exon accounting for the vast majority of the conserved carbohydratebinding residues characteristic of the galectin family (Barondes et al., 1994b). Crystallographic analysis of CRD structure indicates that this domain typically is around 130 amino acids in length, with a highly conserved secondary structure consisting of fiveto six-stranded sheets (Rabinovich et al., 2002). 3 Figure 1. The carbohydrate-binding domain (CRD) of galectin-3 displayed in 3D Mol-Viewer, Vector NTI Advance 10.3.1 demo mode (©2007 Invitrogen Corporation, http://www.invitrogen.com); shown with ligand (dark structure) at upper right (PDB ID: 2nmo, from Collins et al., 2007). 4 Galectins can be broadly divided into three structural types (Figure 2) based on the organization of their protein domains (Rabinovich et al., 2002). Prototype galectins (galectins 1, 2, 5, 7, 10, 11, and 13-17; Than et al., 2009) consist of two identical CRDs and may exist as monomers or noncovalent homodimers. In contrast, tandem repeat-type galectins (galectins 4, 6, 8, 9, and 12) consist of two distinct CRDs joined in tandem by a single linker region (Hsu & Liu, 2004). Finally, galectin-3 is the sole known chimeratype galectin (Mazurek et al., 2000). Structurally, galectin-3 is a 31-kd protein consisting of a single CRD at the C-terminus and an N-terminus composed of mostly short tandem repeats with no carbohydrate-binding function (Liu et al., 2002). This N terminus, unique to galectin-3, is the site of self-association with other galectin-3 molecules (Figure 3) to form pentamers (Hsu et al., 1992; Yang et al., 2008) and also has been proposed to play a role in intracellular localization (Gong et al., 1999). 5 Figure 2. Structural categorization of galectins showing prototype (1, 2, 5, 7, 10, 11, 13, 14, 15, 16, 17), tandem repeat (4, 6, 8, 9, 12), and chimeric (3) galectins. 6 Figure 3. Galectin-3 molecules can associate with each other at the N-terminus (left), leading to pentamer formation (right). 7 Intracellular functions of galectins Studies have demonstrated a wide variety of intracellular functions of galectins, some of which may be independent of their carbohydrate-binding activity (Yang et al., 2008). It has been shown that galectins -1 and -3 play a role in pre-mRNA splicing (Vyakarnam et al., 1997; Liu, 2005), and both galectins are associated with the SMN complex during snRNP assembly (Park et al., 2001). Intracellular galectins have further been demonstrated to influence apoptosis, cell growth, and cell cycle regulation (Liu et al., 2002; Hernandez & Baum, 2002). Expression of galectin-3 confers resistance to apoptosis (Yoshii et al., 2002), although this lectin is able to induce apoptosis when delivered extracellularly (Stillmann et al., 2006, Fukumori et al., 2003). Galectin-3 shares considerable sequence similarity with the known apoptosis-suppressing molecule BCL-2 which it has been shown to interact with in vitro (Yang et al., 1996). The exact mechanism for this inhibition of apoptosis has not yet been elucidated, though a number of possibilities have been suggested (Nakahara et al., 2005). However, galectin-3 has been shown to translocate to the perinuclear mitochondrial membrane where it prevents mitochondrial damage and inhibits the release of cytochrome c (Yu et al., 2002; Fukumori et al., 2006) [a known component of the mitochondrial apoptotic pathway released in response to pro-apoptotic stimuli (Cereghetti & Scorrano, 2006; Dimmer & Scorrano, 2006)]. Other pathways may also be involved in galectin-3 mediated apoptotic regulation (Liu, 2000; Fukumori et al., 2004; Oka et al., 2005). In contrast, studies have shown intracellular galectin-7 to promote apoptosis in epithelial cells, possibly by affecting the expression of apoptotic regulators (Liu et al., 2002). 8 Extracellular functions of galectins Many extracellular functions of galectins are related to their cross-linking ability. Inherent structural characteristics of galectins facilitate the formation of extensive galectin-glycan lattices (Yang et al., 2008). Homodimerization in prototype galectins and the heterodimeric structure of tandem-repeat galectins enable simultaneous bivalent saccharide binding (Brewer, 2002), while galectin-3 molecules can self-associate to form pentamers (Figure 3). In both cases, the binding of galectins with multivalent carbohydrates can result in lattice formation (Brewer et al., 2002, Yang et al., 2008). These lattices (Figures 4 and 5) have been shown to function in the organization of plasma membrane domains as well as signaling regulation (Garner & Baum, 2008), and raft associated galectin-3 is believed to modulate dendritic cell migration (Hsu et al., 2009b). Braccia et al. (2003) found that galectin-4 surface lattices stabilized lipid raft formation in intestinal cell microvilli, and other studies have shown a role for galectin-3 lattices in apical sorting of non-raft glycoproteins (Delacour et al., 2006). Galectin-3 has been determined to play a major role in macrophage function, localizing in phagocytic cups and phagosomes (Sano et al., 2003) including mycobacterial phagosomes (Beatty et al., 2002); in addition, extracellular galectin-3 is a powerful chemoattractant in human monocytes and macrophages (Sano et al., 2000; Kuwabara et al., 2003). Galectin-3deficient macrophages showed a markedly diminished phagocytic capability, both in vitro and in vivo (Sano et al., 2003). 9 Figure 4. Homodimers of prototype galectins (light circles) bind to glycans (black crosses) to form lattices. 10 Figure 5. Tandem-repeat galectins containing two different CRDs (light and dark circles, connected by linker regions) can form lattice structures with bivalent ligands displaying distinct saccharide groups, shown here as crosses with differently-colored arms. 11 Galectins can facilitate cell adhesion by binding to glycoproteins and glycolipids located on adjacent cells and/or the extracellular matrix (Yang et al., 2008); though early research showed that the binding of cell-surface galectin-1 (Gu et al., 1994), galectin-3 (Sato & Hughes, 1992), and galectin-8 (Hadari et al., 2000) was capable of inhibiting adhesion across a range of cell types by interfering with laminin-integrin interaction. Numerous ligands, bound by different galectins, have been identified in many different cell types as involved in cell-cell or cell-matrix adhesion (Kuwabara et al., 2003). Galectin-3 binding has been linked to neutrophil activation (Nieminen et al., 2005) as well as mediating adhesion and extravasation in neutrophils (Sato et al., 2002). Galectin-modulated immunoregulation Various other immune functions have been attributed to extracellular galectin interactions (Rabinovich et al., 2002). Galectins have been shown to play a role in inflammation and inflammatory response regulation both in vitro and in vivo, suggesting important roles in both adaptive and innate immunity (Rabinovich et al., 1999b; Zuberi et al., 1994; Bernardes et al., 2006). Extracellular galectin-1 has been shown to inhibit activation of T lymphocytes and induce arrest and apoptosis in already-activated T cells (Perillo et al., 1995; Nguyen et al., 2001) via modulation of T-cell receptor signaling (Garner & Baum, 2008). Cell-surface galectin-1 in T-cells binds to a discreet set of glycoproteins including CD3, CD4, CD7, CD43, and CD45 which are known to be involved in T-cell development and activation; and galectin-1 interaction with these 12 glycoproteins in galectin-glycan lattices has been implicated in the modulation of T-cell receptor signaling (Liu et al., 2008). Galectin-3 has also been shown to regulate lymphocyte function (Peng et al., 2008), and galectin-3 cross-linking and lattice formation has been proposed to inhibit Tcell activation by interfering with T-cell receptor clustering (Demetriou et al., 2001). In T-cells, galectin-3 takes on a dual role with regard to apoptosis; intracellularly, galectin-3 not only plays an anti-apoptotic function but promotes T-cell proliferation (Dhirapong et al., 2009), whereas extracellular galectin-3 induces apoptosis as mentioned previously. Galectins have been demonstrated to modulate cytokine secretion and T-cell activation and development (Hsu et al., 2009a), including that of T helper cells (Ilarregui et al., 2005). Activated T helper cells differentiate into TH1 and TH2 effector cells depending on the cytokines present, with TH1 cells associated with inflammation and delayed-type hypersensitivity and TH2 associated with parasite clearance and allergy (Toscano et al., 2007). Galectin-1 was shown to inhibit pro-inflammatory cytokines (Rabinovich et al., 1999a), resulting in a shift towards a TH2 response (Santucci et al., 2003; Motran et al., 2008). Moreover, TH2 cells were not as susceptible to galectin-1 induced cell death as were TH1 cells (Toscano et al., 2007). Galectin-2 has also been indicated to favor a TH2 response (Ilarregui et al., 2005). Additionally, Zuberi et al. (2004) suggested a T helper regulatory role for galectin-3, noting that galectin-3 knockout mice exhibited reduced TH2 and elevated TH1 response following antigen challenge and subsequent airway inflammation. However, Cortegano et al. (1998) reported that exogenous galectin-3 inhibited IL-5, a major TH2 cytokine, in certain cell 13 lines. The aforementioned galectin-modulated effects on T-cell response have been implicated in allergic and autoimmune inflammation (Rabinovich et al., 2002; Rabinovich et al., 2007). Galectin cross-linking has been linked to the regulation of immune synapse formation between T-cells and antigen presenting cells (Laderach et al., 2010). In already-activated CD4-positive T-cells, galectin-3 was observed to localize to the cytoplasmic side of the immunological synapse, suggesting that this galectin destabilizes synapse formation by promoting T-cell receptor downregulation via an intracellular pathway (Chen HY et al., 2009). In contrast, galectin-1 expressed on the surface of stromal cells has been suggested to bind to pre-B-cell receptor, thereby aiding in the formation of the pre-B-cell/stromal cell synapse and facilitating signaling during B-cell development (Gauthier et al., 2002). Saccharide binding by galectins has been shown to play a role in host-pathogen interactions (Rabinovich and Toscano, 2009), either by directly modulating pathogen recognition and invasion through interaction with pathogen saccharides at the cell surface (Beatty et al., 2002; Vray et al., 2004; Okumura et al., 2008) or by affecting immune responses through various signaling pathways (Hsu et al., 2006). Moreover, host galectin functions may also be exploited by bacterial, viral, or parasitic pathogens. Galectin-3 has been demonstrated in mice to contribute to Toxoplasma gondii intracellular survival by downmodulating cell death in infected host neutrophils (Alves et al., 2010). A number of studies have demonstrated the ability of viral infection to affect galectin expression and/or secretion (Fogel et al., 1999; King et al., 2009). In the case of human T-cell 14 lymphotropic virus type 1 (HTLV-1), not only is expression of galectin-1 and -3 upregulated by viral infection (Hsu et al., 1996; Gauthier et al., 2008), but galectin-1 has been implicated in the HTLV-1 infection process by facilitating cell-cell interaction through crosslinking of cell-surface glycoproteins (Gauthier et al., 2008). Galectins and cancer The extensive and varied roles played by galectins in immunity, cell-cycle regulation, apoptosis, and adhesion have drawn attention to possible functions of galectins in carcinogenesis and cancer progression (Lahm et al., 2004). Galectins can influence tumorigenesis intracellularly by interacting with Ras-subfamily proteins and other cell cycle regulators (Yang et al., 2008). Galectins-1 and -3 have also been shown to promote angiogenesis through intracellular (Thijssen et al., 2006) and extracellular processes, respectively (Yang et al., 2008). Furthermore, galectin-mediated effects, including but not limited to extracellular glycoprotein binding, have been linked to tumor cell migration and metastasis (Zou et al., 2005; Yang et al., 2008). Expression differences are seen in many galectins across a wide range of cancers when compared with normal tissue (Xu et al., 1995 & 2000; Plzak et al., 2004; Prieto et al., 2006). So far, the roles of galectins -1 and -3 in tumorigenesis and cancer progression have been widely researched. As previously mentioned, galectin-1 has been shown to induce apoptosis of effector T-cells (Perillo et al., 1995), thereby influencing cancer cell survival via modulation of anti-tumor T-cell responses. This induction of apoptosis may occur via multiple pathways, both intracellular and extracellular (Salatino & Rabinovich, 15 2011). More directly, pro-apoptotic effects of galectin-1 may be manifested in cancer cells themselves (van den Brûle et al., 2004), although this is strongly dependent upon other factors such as altered regulation of glycosyltransferase genes (Valenzuela et al., 2007) that affect galectin-1 binding to surface glycoproteins and thereby also affect the triggering of downstream apoptotic signaling cascades (Hsu et al., 2006). Galectin-3 is a known apoptotic regulator with both suppressive and inductive effects on apoptosis (Hsu et al., 2006), and increased galectin-3 expression has been observed in many different cancer types (van den Brûle et al., 2004). Increased expression of this galectin has also been associated with poor prognosis in many cancers, probably as a result of increased metastatic and invasive potential due to heightened adhesion (Rabinovich et al., 2002). However, decreased galectin-3 expression has been correlated with aggressive phenotype in some cancers (Castronovo et al., 1996), indicating that other factors may also be involved in the regulation of metastasis by galectin-3, and Matarrese et al. (2000) proposed that overexpression of galectin-3 may in fact prevent metastasis in some cancers by inhibiting detachment of potentially metastatic tumor cells via improved cell-cell adhesion. Diminished expression of galectin-3 has been reported in multiple cancers (Danguy et al., 2002, van den Brûle et al., 2004), suggesting markedly differing effects of galectin-3 depending on cancer type, tumor environment, and intracellular versus extracellular expression of galectin-3 (Rabinovich et al., 2002). Although not studied in as much detail as galectins -1 and -3, other galectins have been shown or suspected to function in tumor growth and development (Danguy et al., 16 2002). Similarly to galectin-1, the effects of galectin-2 upon T-cell development and survival have been linked to tumor progression (Salatino & Rabinovich, 2011). Galectin2 can modulate immune tolerance to tumors by triggering caspase-dependent apoptotic pathways, thereby inducing programmed death of activated T-cells (Sturm et al., 2004). Galectin-7 has also been proposed to negatively regulate tumor progression by suppressing proliferation (Saussez and Kiss, 2006). Hadari et al. (2000) reported that galectin-8-induced inhibition of adhesion also resulted in the induction of apoptosis in human cancer cells. Additionally, a number of studies have noted a correlation between increased galectin-9 expression in tumors and a reduction in invasiveness, and galectin-9 has also been found to induce apoptosis and cell cycle arrest in adult T-cell leukemia by reducing the expression of various regulatory factors (Rabinovich et al., 2007). Thus, galectin-9 has been considered to be a prognostic factor in cancer progression (Yamauchi et al., 2006; Nobumoto et al., 2008). Transcriptional regulation of galectin genes Transcriptional regulation of galectin expression occurs at upstream promoter regions (Chiariotti et al., 2004). The human galectin-3 gene (LGALS3) promoter has been functionally characterized (Kadrofske et al., 1998), although a search of the publications referenced at the PubMed literature database reveals that considerably less research has been carried out regarding other galectin promoters. Unlike other galectin genes, LGALS3 contains an internal promoter element in its second intron (Raimond et al., 1995) regulating expression of the galectin-3 internal gene GALIG which encodes the 17 unrelated mitochondrial-targeted protein mitogaligin (Duneau et al., 2005). Regulation of this internal gene occurs independently of LGALS3 regulation (Guittaut et al., 2001). In humans, cytosines adjacent to guanines (CpG sites) are usually methylated, thereby inhibiting transcription. Methylation of promoter CpGs can lead to transcription alteration via gene silencing (Saxonov et al., 2006). However, regions of higher CG density known as CpG islands, often located near transcription start sites, are unmethylated in expressed genes. Single-nucleotide polymorphisms (SNPs) overlaying CpG islands or individual CpG sites may affect transcription by altering methylation patterns, and changes in promoter methylation have been shown to affect galectin expression in various cancers (Ruebel et al., 2005; Ahmed et al., 2007; Demers et al., 2009) such that hypermethylation of the LGALS3 promoter in prostate adenocarcinoma has been proposed as a marker for early detection (Ahmed, 2010). Overlap of SNPs with transcription factor binding sites may also affect gene expression (Ameur et al., 2009). It has been shown that even single-nucleotide genomic variations in regulatory regions may have significant effects on transcription factor binding ability, resulting in differences in gene expression levels, although the effects of genetic variation differ depending on the DNA binding motifs present in different transcription factors (Kasowski et al., 2010). Despite the relative paucity of studies in this area, a number of regulatory elements have been found in galectin promoters (Chiariotti et al., 2004). Here, it is reported that in silico analytical techniques point towards a possible association between the rs3763959 polymorphism upstream of the galectin-9 start site 18 and human breast carcinoma. No previous work has been done regarding potential effects of galectin promoter polymorphisms on cancer susceptibility, although association studies have been done for promoters in other genes involved in apoptotic pathways (Cerhan et al., 2008; Cordano et al., 2005; Novak et al., 2009; Enjuanes et al., 2008) and a LGALS2 coding region polymorphism has been linked to ischemic stroke (Yamada et al., 2008) and myocardial infarction in some (Ozaki et al., 2004) but not all (Mangino et al., 2007; Sedlacek et al., 2007) human populations. Moreover, a search of the PubMed citation database reveals that no published in silico analyses using GEO data have been carried out with regard to galectin promoter SNPs, further demonstrating the originality of this project. 19 OBJECTIVES 1: Parse the Oncomine database for expression differences in galectins 1-4 and 7-10 between cancerous and normal tissue. 2: Locate SNPs in upstream regulatory regions of genes coding for the above eight galectins using the International HapMap Project database. 3: Screen SNP sequences for putative overlapping transcription factor binding sites in upstream regulatory regions using fSNP, Consite, and TESS (TRANSFAC and IMD) search platforms. 4: Use MEGA alignment software to locate individual CpG sites overlaying upstream SNPs. Search for known CpG islands coinciding with the SNP sites using the UCSC Genome Browser, then inspect sequences for putative CpG islands using CpG Island Searcher, CpG Plot, and CpG Island Explorer. 5: Determine if there is any association between presence of galectin promoter SNPs and cancers showing significant galectin expression differences by screening Illumina wholegenome SNP data archived at the Gene Expression Omnibus. 20 MATERIALS AND METHODS All work was carried out in the laboratory of Dr. Fu-Tong Liu (University of California Davis Medical Center, Department of Dermatology) at the UC Davis Medical Center. Oncomine Oncomine (http://www.oncomine.org) is an online microarray database, containing expression data from numerous studies (Rhodes et al., 2004). Differential expression analyses enable side-by-side comparisons of gene expression between normal and cancerous tissues for a wide variety of malignancies (Rhodes et al., 2007). Oncomine data is organized by study and tissue type. Analyses may compare expression of a particular gene in cancer versus normal tissue, normal versus normal, cancer versus cancer, or by other criteria such as molecular alteration or patient prognosis (Rhodes et al., 2007). Gene expression is presented in log2 transformed expression units. Expression profiles are visualized as boxplots showing the median, first and third quartiles, and 10th and 90th percentiles in addition to outliers. To allow flexibility in significance testing, pvalue cutoffs can be set at various levels by the user. The Oncomine database was screened for expression of galectins 1-4 and 7-10 in multiple non-lymphatic cancers in comparison with normal tissue. Although a conservative p-value threshold of 1E-4 was initially chosen for stringency as per Duhagon et al. (2010), it was felt that this eliminated too many Oncomine studies and the p-value threshold was subsequently altered to 1E-2. Expression differences for Oncomine 21 datasets were obtained by subtracting median cancer expression from median noncancer expression. The resultant value is positive if overexpressed in normal tissue and negative if overexpressed in cancerous tissue. For individual datasets, expression differences in excess of 0.9 log-normalized expression units were considered significant. This threshold was chosen as it was broad enough to guarantee a range of suitable GEO datasets, while retaining a level of stringency matching or exceeding similar Oncomine comparisons in the published literature (Draheim et al., 2010; Tang et al., 2010). SNP selection using the HapMap SNP database The International HapMap Project (http://www.hapmap.org) database was used to locate SNPs in galectin promoter regions. This project is a large-scale global collaborative effort involving multiple research groups aimed at cataloging SNP variations in the human genome (International Hapmap Consortium, 2003 & 2005). The initial Phase 1 haplotype map was created by genotyping 270 individuals from four widely separated populations: Japanese in Tokyo (JPT), residents of Utah with Northern/Western European ancestry (CEU), Yoruba in Ibadan, Nigeria (YRI), and Beijing residents of Han ancestry (CHB). It should be cautioned that due to recent mass migration to urban areas as well as the inclusion of university students in the CHB population, the composition of HapMap populations may not necessarily be representative of the local populations in the areas surveyed (He et al., 2009). All SNPs included in HapMap conformed to Hardy-Weinberg equilibrium (HWE) expectations (exact test, p-value of 0.001 or less) for individual populations sampled (International 22 Hapmap Consortium, 2003 & 2005); although due to population stratification this is not necessarily true when multiple populations are aggregated. The default minor allele frequency (MAF) cutoff for Phase I HapMap is 0.05 for each of the four populations sampled, a threshold intended to screen out rare alleles and simplify haplotype mapping (International Hapmap Consortium, 2003 & 2005). In order to maintain consistency this cutoff was used for SNP selection even for later additions to the database, excluding rarer SNPs or those restricted to a single population; a 0.05 or 0.01 MAF threshold is often used for clinical studies since detection and analysis of less common alleles may require unrealistically large sample sizes (Nebert et al., 2008). The HapMap database was consulted to locate SNPs within 2kb upstream of LGALS1-4, 710, and 12 start sites. The LGALS3 internal regulatory region was ignored because, as previously mentioned, it controls transcription of the GALIG internal gene and has not been shown to affect regulation of LGALS3 itself (Chiariotti et al., 2004). The presence of each selected SNP from HapMap was confirmed using the UCSC Genome Browser (http://genome.ucsc.edu) at the University of California, Santa Cruz. Transcription factor binding site location Four online tools were used to detect putative transcription factor (TF) binding sites coinciding with SNPs (in both the reference and nonreference alleles) in galectin promoter regions: TFSearch (via fSNP at http://compbio.cs.queensu.ca/F-SNP/), the TRANSFAC 6.0 and IMD databases (via TESS, the Transcriptional Element Search System at UPenn; http://www.cbil.upenn.edu/cgi-bin/tess/tess?RQ=WELCOME), and the 23 Consite platform at the University of Bergen (http://asp.ii.uib.no:8090/cgibin/CONSITE/consite/). Multiple tools were used due to the uncertainty associated with inherent limitations in site scanning algorithms (Carmack et al., 2007). Even with this measure it is cautioned that in silico TF binding site recognition models remain prone to various errors and therefore may not accurately reflect real-life processes (Hannenhalli, 2008); furthermore, unlike CpG sites, replacement of a single nucleotide in a TF binding site may not sufficiently alter the site to affect or prevent binding. Nonetheless, these search platforms have been used to find candidate sites in multiple published studies (Li et al., 2009; Banerjee & Nandagopal, 2007). It should be noted that TFSearch relies on the earlier TRANSFAC 3.3 which is several years old; the current version is 7.0 as of 2005. However, the results do not show a great deal of difference between the 3.3 and 6.0 versions. The default significance thresholds (85% for all with the exception of Consite which was set to 80%) were used for each program. Location of CpG sites and islands In silico detection algorithms have been used to find CpG sites and islands in past studies (Wang & Leung, 2004; Li et al., 2006). Three of these search platforms were used to investigate CpG islands in the vicinity of galectin promoter SNPs: CpG Island Searcher (http://cpgislands.usc.edu; web-based), EMBOSS CpG Plot (http://www.ebi.ac.uk/Tools/emboss/cpgplot/index.html; web-based), and CpG Island Explorer (http://bioinfo.hku.hk/cpgieintro.html; downloaded Java application). Individual CpG sites were confirmed using MEGA alignment software (http://megasoftware.net). 24 As the definition of the length of a CpG island varies by author (Takai & Jones, 2002), both the default (55%GC, 0.65 obs/exp, 500 bp length, 100bp gap for CpG Island Searcher; 100bp window, 0.6 obs/exp ratio, 50% GC, 200bp length for CpG Plot; 50%GC, 0.6 obs/exp, 500bp length for CpG Island Explorer) and least stringent (50%GC, 0.6 obs/exp, 200 bp length, 100bp gap for CpG Island Searcher; 100bp window, 0.6 obs/exp ratio, 30% GC, 50bp length for CpG Plot; 50%GC, 0.6 obs/exp, 200bp length for CpG Island Explorer) settings were used for each program. Genomic data and association analysis The NBCI’s Gene Expression Omnibus (GEO; accessed at http://www.ncbi.nlm.nih.gov/gds) is an extensive public database of expression data from a diverse range of published and unpublished studies, including whole-genome studies. As Illumina (http://www.illumina.com) is the only manufacturer to include the majority of galectin promoter SNPs of interest, only datasets derived from those studies performed using Illumina bead arrays were used. This had the added benefit of obviating the need for meta-analytical techniques, since nearly identical arrays could be analyzed using a log-likelihood test (G-test). One drawback is that not all the SNPs in question are included on the Illumina arrays, but these are the most comprehensive of the more commonly-used systems available. Genomic SNP data from Illumina BeadArrays covering bladder cancer, esophageal adenocarcinoma, lung squamous cell carcinoma, melanoma, and renal clear cell carcinoma were extracted from studies archived at the GEO, as significant 25 differential expression had previously been found in these cancers using Oncomine. In addition, galectin promoter SNPs were investigated in breast carcinoma since, as mentioned in the introduction, multiple studies have pointed to galectin-9 as a prognostic factor in this disease (Yamauchi et al., 2006; Irie et al., 2005). In the two datasets (GSE13282, Gordan et al., 2008; GSE16019, Chen M et al., 2009) where the authors failed to include genotype calls, these were obtained manually by calculating the B allele frequency as per LaFramboise et al. (2010). Genotype and allele frequencies were compared between the GEO datasets (Table 1) and both individual and aggregated HapMap Phase 1 populations as controls, with Williams’ corrected G-test for independence used to determine association as per Viviani Anselmi et al. (2008) under a null hypothesis of no association and p-values of 0.05 or less considered significant (Daly, 2009). A Williams’ corrected two-degree G-test for goodness-of-fit was used to determine Hardy-Weinberg equilibrium as per McDonald (2009) at a p-value threshold of 0.05 or less. In each particular cancer of interest, analyses were still performed on promoter SNPs in galectins that did not exhibit significant differential expression for that cancer, since these were still useful as negative controls. 26 Table 1. Relevant Illumina cancer SNP datasets archived at the Gene Expression Omnibus. It should be noted that rs929039, rs10489789, and rs428007 are not featured on the Illumina HumanOmni1Quad array used in Castillo et al. (2010). Study Letouze et al. (2010) Popova et al. (2009) Waddell et al. (2010) Staaf et al. (2008) Nancarrow et al. (2008) Gordan et al. (2008) Chen M et al. (2009) Castillo et al. (2010) Stark & Hayward (2007) GEO accession no. GSE19189 GSE18799 GSE19177 GSE11976 GSE10506 GSE13282 GSE16019 GSE21168 GSE9003 Tissue type Bladder Breast Breast Breast Esophagus Kidney Kidney Lung Melanoma Sample size 20 21 34 HCC1395 cell line 23 adenocarcinoma 21 ccRCC 80 paired ccRCC 4 lung SCC 76 melanoma cell lines Illumina array HumanCNV370-QuadV3_C HumanHap300-Duov2 HumanCNV370-Duov1 HumanCNV370-Duov1 HumanHap300 HumanHap550-Duov3 HumanHap300-Duov2 HumanOmni1-Quad HumanHap300 27 Williams’-corrected G-tests were carried out using Microsoft Excel (http://www.microsoft.com) software as per McDonald (2009). G-test degrees of freedom were 1 and 2 respectively for allele and genotype frequency testing. Williams’ corrected G values were compared against a chi-square distribution table (Mann, 2004) and thresholds of 3.841 or greater and 5.991 or greater were selected for 1 and 2 degrees of freedom, respectively. 28 RESULTS Differential expression of galectins in Oncomine cancers In order to find possible cancer types for galectin promoter SNP association analysis, the Oncomine database was screened for differential expression of galectins 1-4 and 7-10 in multiple cancers in comparison with normal tissue. At a significance threshold consisting of a difference in expression in excess of 0.9 log-normalized expression units, twenty-one cancers of interest (Table 2) were found, some of which were genotyped at studies archived at the NCBI Gene Expression Omnibus. 29 Table 2. Differential expression of galectins in nonlymphatic tissue (cancer vs. normal), from Oncomine. Galectin-3 is excluded here due to lack of LGALS3 promoter SNPs on most Illumina BeadChip arrays. Malignancy LGALS1 bladder brain, glioblastoma up brain, glioblastoma multiforme brain, glioma brain, oligodendroglioma down endometrium down esophagus head-neck up kidney up liver lung adenocarcinoma lung SCC melanoma up mesothelioma ovarian pancreas, adenocarcinoma pancreas, ductal carcinoma prostate salivary seminoma/testis tongue SCC LGALS2 LGALS4 LGALS7 LGALS8 up LGALS9 CLC up down down up up down down down down up down up down down up up down up up up down up down down up up 30 Oncomine data from Table 2 is from individual studies as follows: Bladder: Dyrskjøt et al., 2004 Brain, glioblastoma: Bredel et al., 2005 Brain, glioblastoma multiforme: Liang et al., 2005; Shai et al., 2003 Brain, glioma: Rickman et al., 2001 Brain, oligodendroglioma: Bredel et al., 2005 Endometrium: Mutter et al., 2001 Esophagus: Wang et al., 2006 Head-neck: Chung et al., 2004; Ginos et al., 2004; Talbot et al., 2005; Toruner et al., 2004 Kidney: Lenburg et al., 2003 Liver: Chen et al., 2002 Lung adenocarcinoma: Garber et al., 2001 Lung squamous-cell carcinoma (SCC): Garber et al., 2001; Wachi et al., 2005 Melanoma: Haqq et al., 2005; Talantov et al., 2005 Mesothelioma: Gordon et al., 2005 Ovarian cancer: Hendrix et al., 2006 Pancreas, adenocarcinoma: Iacobuzio-Donahue et al., 2003 Pancreas, ductal carcinoma: Ishikawa et al., 2005 Prostate: Varambally et al., 2005 Salivary gland: Frierson et al., 2002 Seminoma/testis: Korkola et al., 2006; Skotheim et al., 2005; Sperger et al., 2003 Tongue squamous-cell carcinoma (SCC): Talbot et al., 2005 31 In general the data obtained from the Oncomine differential expression analyses was consistent with previous reports, although in some cases the degree of expression difference did not entirely match what had been published in previous studies (Valenzuela et al., 2007). Significant overexpression of galectin-4 was noted in cancers of the liver (Chen et al., 2002) and esophagus, although galectin-7 was greatly underexpressed in esophageal adenocarcinoma (Wang et al., 2006). It should be noted that using the p-value threshold of 1E-2 may have resulted in the elimination of some studies, although this threshold had been altered from an earlier, even more stringent, threshold of 1E-4. Promoter SNPs located on HapMap database In total, seven galectin promoter SNPs were located on the HapMap Phase 1 database which are also present in the commonly used high-resolution Illumina BeadArray (Table 3). The presence of each selected SNP was confirmed using the UCSC Genome Browser (http://genome.ucsc.edu). One CLC (galectin-10) promoter polymorphism (rs428007) was found not to be in Hardy-Weinberg equilibrium in the HapMap Phase 1 aggregated population; however, due to previously mentioned population stratification, this is not considered to be significant. 32 Table 3. Illumina BeadArray SNPs in upstream promoter regions of galectin genes. Gene LGALS1 LGALS1 LGALS2 LGALS4 LGALS8 (long) LGALS9 CLC/LGALS10 SNP rs4820294 rs929039 rs2235338 rs10403583 rs10489789 rs3763959 rs428007 Location chr22:36400989 chr22:36401457 chr22:36295826 chr19:43983611 chr1:234746800 chr17:22981461 chr19:44913207 Alleles (ref/other) G/A T/C G/A A/G G/A A/G C/T 33 Multiple predicted transcription factor binding sites The default significance thresholds were used for each search platform; this was 85% for all except Consite which was set to 80%. Only four SNPs showed putative sites that were predicted by two or more programs (Table 4). 34 Table 4. Predicted transcription factor binding sites overlapping with SNPs (both reference and nonreference alleles) in upstream promoter regions of galectin genes. Only SNPs with sites predicted by multiple platforms are shown. Gene SNP FSNP/TFSearch (TRANSFAC 3.3) LGALS1 rs929039 ref:cap, c-Myb LGALS2 rs2235338 other:c-Myb LGALS9 rs3763959 CLC rs428007 Consite TESS/TRANSFAC 6.0 TESS/IMD Sites predicted by three platforms (no TF changes) (no TF changes) ref:c-Myb; other:cMyb other:c-Myb (none) other:cMyb other:cMyb other:E2F other:E2F (no TF changes found) (none) other:E2F ref:C/EBPb; other:C/EBPb ref:USF, Max; other:Thing1E47 ref:Lmo2, GATA-3, GATA-2, AREB6, cap, E12, E47, ITF-2, Tal-1; other:HiNF-A, cap ref:LBP-1, CP2 other:LBP1, CP2, cMyb ref:CACbinding; other:E2FDRTF ref:Tal-1, RFX2, NFX3, NFuE4, E2A, E12, LBP1; other:MBF1 Sites predicted by two or more platforms ref: c-Myb (none) ref:Tal-1, E12 35 Putative methylation sites coinciding with galectin promoter SNPs No unusually GC-rich regions were found using the default settings (55%GC, 0.65 obs/exp, 500 bp length, 100bp gap for CpG Island Searcher; 100bp window, 0.6 obs/exp ratio, 50% GC, 200bp length for CpG Plot; 50%GC, 0.6 obs/exp, 500bp length for CpG Island Explorer) on any of the programs, although this was not unexpected as the UCSC genome browser (http://genome.ucsc.edu) did not list any CpG islands at any of the SNP sites. However, when the applications were run again using the least stringent settings (50%GC, 0.6 obs/exp, 200 bp length, 100bp gap for CpG Island Searcher; 100bp window, 0.6 obs/exp ratio, 30 min % GC, 50bp length for CpG Plot; 50%GC, 0.6 obs/exp 200bp length for CpG Island Explorer), a number of putative islands were found at or near SNP locations (Table 5). In addition, most SNPs with the exception of rs10403583 and rs428007 were found to overlap individual CpG sites. 36 Table 5. Individual CpG sites and putative CpG islands at SNP locations. Gene LGALS1 LGALS1 LGALS2 LGALS4 LGALS8 (long) LGALS9 CLC/LGA LS10 SNP rs4820 294 rs9290 39 rs2235 338 rs1040 3583 rs1048 9789 rs3763 959 rs4280 07 CpG site at SNP location Putative islands, CpG Island Searcher Putative islands, CpG Island Explorer (none) Putative islands, CpG Plot other (downstream) ref, other (downstream) ref other (downstream) other ref (none) ref (none) (none) (none) ref, other (none) ref (none) (none) (none) other (none) other (none) (none) (none) (none) (none) other (none) 37 Significant genotype and allele frequency differences uncovered Association analysis using Williams’ corrected G-test revealed significant differences in both genotype (Appendix B, 1 to 7) and allele (Appendix C, 1 to 7) frequencies at galectin promoter SNP locations between individual GEO cancer datasets and both individual and aggregated HapMap controls. No association analysis could be conducted on the Staaf (2008) breast dataset GSE11976 as it consisted only of the HCC1395 breast carcinoma line. The Castillo (2010) lung dataset, GSE21168, only included four tumor samples and used a slightly different Illumina array that excluded some of the investigated promoter polymorphisms; only rs2235338 (LGALS2), rs3763959 (LGALS9), rs4820294 (LGALS1), and rs10403583 (LGALS4) were included; therefore results obtained from statistical analysis of GSE21168 are presumed to be less robust than those involving other GEO datasets. Most of these associations between promoter SNPs and cancers are probably spurious, despite their apparent statistical significance. The majority of putative associations occur in comparisons using either non-European HapMap populations or the aggregated HapMap Phase 1 poulation as control groups. For galectin promoter SNPs, there are significant differences in allele and genotype frequencies between different HapMap populations (see Appendices B and C), and it has been shown that differences in allele frequencies between even populations in relatively close geographical proximity (Sedlacek et al., 2007) can result in difficulties extrapolating the results of association studies from one population to another. The breast cell line used in the GSE11976 (Staaf et al., 2008) study was derived from a female of European descent, and the GSE16019 38 (Chen M et al., 2009) dataset samples were derived from patients of mostly white and Hispanic ancestry. The other GEO cancer datasets do not explicitly list the ethnic background of the study participants; however, as most of the studies (GSE21168: Castillo et al., 2010; GSE13282: Gordan et al., 2008; GSE19189: Letouze et al., 2010; GSE10506: Nancarrow et al., 2008; GSE18799: Popova et al., 2009) involved hospital patients in regions with a predominantly European-descended demographic majority, it can be inferred that the HapMap CEU population is a better control for these datasets than either the non-European populations or the aggregated HapMap population. When non-CEU controls are excluded, five SNPs show significant differences in genotype and/or allele frequency in bladder carcinoma (GSE19189: Letouze et al., 2010), breast carcinoma (GSE18799: Popova et al., 2009; GSE19177: Waddell et al., 2010), esophageal adenocarcinoma (GSE10506: Nancarrow et al., 2008), and melanoma (GSE9003: Stark & Hayward, 2007) datasets (Table 6). 39 Table 6. SNPs exhibiting significant genotype and/or allele frequency differences in study datasets when compared to HapMap CEU population only. SNP rs4820294 (LGALS1) rs2235338 (LGALS2) rs10403583 (LGALS4) rs10489789 (LGALS8) rs3763959 (LGALS9) Letouze GSE19189 Nancarrow GSE10506 (none) Genotype only (none) (none) (none) Popova GSE18799 Stark GSE9003 Genotype only Waddell GSE19177 Genotype and allele (none) (none) Allele only (none) Allele only Genotype only (none) Genotype only Genotype and allele (none) (none) (none) (none) (none) (none) Genotype only (none) Genotype only 40 As previously mentioned, statistical testing could not be carried out on the GSE11976 dataset (Staaf et al., 2008) due to the fact that it consisted of only one sample. Nonetheless, SNP genotype data was extracted from this dataset (Appendix D). The single LGALS9 upstream SNP analyzed, rs3763959, was found not to conform to Hardy-Weinberg equilibrium expectations at a p-value threshold of 0.05 in the single GSE9003 melanoma dataset (Stark & Hayward, 2007) or in the two breast cancer datasets GSE18799 (Popova et al., 2009) and GSE19177 (Waddell et al., 2010) downloaded from GEO. In the Staaf (2008) dataset, GSE11976, the genotype at rs3763959 was G/G (Appendix D), further hinting at a link, although this dataset consisted of only a single breast cancer cell line (HCC1395). A number of additional SNPs [rs4820294 (LGALS1) in GSE10506 (Nancarrow et al., 2008) and GSE9003 (Stark & Hayward, 2007), rs10403583 (LGALS4) in GSE9003 (Stark & Hayward, 2007)] were also not in Hardy-Weinberg equilibrium at p<0.05 in some datasets (Appendix E, 1 to 7); however, of these galectins, only the putative rs4820294/melanoma link showed any correlation with the Oncomine differential expression data (Table 2). Interestingly, this SNP showed a significant genotype frequency difference between the GSE9003 study population and the HapMap CEU population (Table 6). 41 DISCUSSION The above results indicate a statistically significant genotypic association of the LGALS9 upstream polymorphism rs3763959 with breast carcinoma. This association is observed in two different datasets, GSE18799 (Popova et al., 2009) and GSE19177 (Waddell et al., 2010). The distribution of this SNP is significantly skewed toward the G/G genotype and does not conform to Hardy-Weinberg expectations in the two study populations, although the allele frequencies do not significantly differ from the HapMap CEU population in either dataset. Significant deviation from Hardy-Weinberg equilibrium may be a sign of consanguinity in a sampled population, or may indicate genotyping or procedural error (Talseth et al., 2006). It is not likely that the observed deviation from expected HardyWeinberg distribution in the breast cancer datasets derives from genotyping error, since this deviation occurs in two datasets from different studies, both of which utilized Illumina BeadArray genotyping kits. Furthermore, the Popova et al. (2009) and Waddell et al. (2010) patient populations were from different countries (France and Australia, respectively, although fourteen Dutch and United States samples were included in the Waddell study), decreasing the possibility of consanguinity. Therefore, this discrepancy may be more likely to indicate an actual correlation (Gyorffy et al., 2003) between the rs3763959 polymorphism and breast carcinoma, although other possibilities must not be ruled out. A statistically significant association of the LGALS1 upstream polymorphism rs4820294 was also observed in connection with the GSE9003 (Stark & Hayward, 2007) 42 melanoma dataset, and it is worth noting that the rs4820294 SNP also does not conform to Hardy-Weinberg expectations in this dataset. However, it should be cautioned that as this dataset derives from a single study, genotyping or methodological errors are less easily ruled out. Galectin-9 and breast cancer Galectin-9, or ecalectin, was at first identified as an eosinophil chemoattractant (ECA) and activator (Matsumoto et al., 1998). This galectin has been described as having anti-metastatic potential in breast cancer (Yamauchi et al., 2006), melanoma (Kageshita et al., 2002), oral squamous cell carcinoma (Kasamatsu et al., 2005), and lung cancer (Nobumoto et al., 2008). Inhibition of metastasis most likely occurs, at least in some cancers, by galectin-9 competitively blocking adhesion to the extracellular matrix (Nobumoto et al., 2008). Anti-proliferative activity has also been observed in multiple myeloma cells via the JNK and p38 MAP kinase signaling pathways (Kobayashi et al., 2010). As previously mentioned, galectin-9 is considered a prognostic factor in breast cancer development (Yamauchi et al., 2006; Nobumoto et al., 2008), although interestingly none of the Oncomine datasets surveyed revealed significant galectin-9 expression differences between normal and cancerous breast tissue. In breast cancer, increased expression of this galectin was inversely associated with distant metastasis (Irie et al., 2005). However, in contrast to previous studies involving melanoma cells (Kageshita et al., 2002); Irie et al. (2005) found no detectable amounts of galectin-9 on 43 the surface of breast cancer cells despite high cytoplasmic expression, suggesting that antimetastatic functions of galectin-9 may involve different processes depending on cancer type. As of this writing, the mechanisms of galectin-9 antimetastatic activity in breast cancer have still not been elucidated. E2F binding site at rs3763959 Multiple search platforms predict an E2F binding site for the nonreference G allele of rs3763959. The E2F family consists of both transcription activators and repressors with diverse functions in the regulation of cell proliferation and apoptosis (Chen HZ et al., 2009). Paradoxically, increased expression of both the transcriptional activator E2F1 (Han et al., 2003) and the transcriptional inhibitor E2F4 (Rakha et al., 2004) has been associated with poor prognosis in metastatic breast cancer. To explain this discrepancy, it was suggested that normally antagonistic E2F proteins could compensate for each other in the formation of complexes with members of the pocket protein family, which are known to directly associate with E2Fs and regulate their activity (Chen HZ et al., 2009). Given the wide range of roles played by the many E2F family proteins, any speculation of a link between the putative E2F binding site at the G allele of rs3763959 and specific E2F family members is well beyond the scope of this study. Methylation at rs3763959 The nonreference G allele of rs3763959 coincides with an individual CpG site. Although it may be tempting to speculate that a link exists between this CpG site and the 44 possible association of the G/G genotype with breast carcinoma, there is no direct evidence to support such a conclusion. Furthermore, as mentioned previously (Table 5), either the reference or nonreference allele of the majority of SNPs analyzed in this study overlap CpG sites. Sixty to seventy percent of CpG dinucleotides are methylated in the genomes of mammalian somatic cells, and most unmethylated CpG sites are not isolated but rather are located in CpG islands in promoters of actively transcribed housekeeping genes (Hartl & Jones, 2009). Although it is possible that the aforementioned CpG sites are methylated, without solid data regarding the methylation status of individual CpG sites in all individuals genotyped in both the GEO datasets and the HapMap populations, it is premature to suggest the possibility that an altered methylation state in the G allele may affect expression, especially as there is no evidence from Oncomine database of differential expression of galectin-9. Conclusions and future directions It should be cautioned that, because of the extremely divergent functions played by galectins in processes affecting cancer development, it would be unwise and premature to speculate on any individual explanation for the observations reported here. As correlation does not equate to causation, the role of factors other than the specific SNPs analyzed here cannot entirely be ruled out. It is concievable that other polymorphisms which had not been included in the HapMap database, but were located in the vicinity of the analyzed HapMap SNPs (and hence may be inherited together with 45 them due to linkage effects), might actually contribute to expression regulation itself. Furthermore, both the Popova (2009) and Waddell (2010) patient populations consisted of fewer than fifty individuals. A study done on a larger scale could alleviate this issue (Manolio, 2010). Though an entirely in silico strategy for SNP association analysis may seem somewhat theoretical, this project utilized procedures commonly used in patient studies (Viviani Anselmi et al., 2008) and used datasets derived from previous human studies, thereby obviating the cost and ethical concerns associated with large-scale patient data collection. However, the results of this study should be considered somewhat preliminary rather than conclusive, and must be confirmed by further research. A case-comparison study involving larger numbers of subjects and controls could be performed to further confirm the associations postulated by this project. This would have several advantages in that the use of higher-resolution arrays could locate promoter SNPS not listed in the HapMap database but which may possibly affect transcription regulation; moreover, high-throughput array-based methylation assays such as the Illumina Infinium array (http://www.illumina.com) are now available which can determine the methylation status of individual CpG sites. Fine-scale genetic mapping of galectin promoters could also provide a better picture of genetic linkage in these regions. Nevertheless, when planning any future studies, the potential benefits must be balanced by taking into consideration both the high cost of labor and/or equipment associated with genotyping studies and the increasing privacy concerns of patients involved in human genetic research. 46 In summary, it is reported here that through the analysis of public-domain data, a possible genotypic association between the LGALS9 upstream polymorphism rs3763959 and breast carcinoma has been uncovered, although further research must be undertaken to confirm the relevance of this finding. 47 APPENDICES 48 APPENDIX A Oncomine results 1. List of individual Oncomine datasets cited Bhattacharjee_Lung: Bhattacharjee et al., 2001 Boer_Renal: Boer et al., 2001 Bredel_Brain_2: Bredel et al., 2005 Buchholz_Pancreas: Buchholz et al., 2005 Chen_Liver: Chen et al., 2002 Chung_Head-Neck: Chung et al., 2004 Cromer_Head-Neck: Cromer et al., 2004 Dhanasekaran_Prostate: Dhanasekaran et al., 2001 Dyrskjot_Bladder_3: Dyrskjøt et al., 2004 French_Brain: French et al., 2005 FriersonHF_Salivary-gland: Frierson et al., 2002 Garber_Lung: Garber et al., 2001 Ginos_Head-Neck: Ginos et al., 2004 Gordon_Mesothelioma: Gordon et al., 2005 Graudens_Colon: Graudens et al., 2006 Haqq_Melanoma: Haqq et al., 2005 Hendrix_Ovarian: Hendrix et al., 2006 Hoek_Melanoma: Hoek et al., 2006 Huang_Thyroid: Huang et al., 2001 49 Iacobuzio-Donahue_Pancreas: Iacobuzio-Donahue et al., 2003 Iacobuzio-Donahue_Pancreas_2: Iacobuzio-Donahue et al., 2003 Ishikawa_Pancreas: Ishikawa et al., 2005 Korkola_Seminoma: Korkola et al., 2006 Lancaster_Ovarian: Lancaster et al., 2004 Lapointe_Prostate: Lapointe et al., 2004 Lenburg_Renal: Lenburg et al., 2003 Liang_Brain: Liang et al., 2005 Logsdon_Pancreas: Logsdon et al., 2003 Luo_Prostate: Luo et al., 2001 Luo_Prostate_2: Luo et al., 2002 Mutter_Endometrium: Mutter et al., 2001 Quade_Uterus: Quade et al., 2004 Richardson_Breast_2: Richardson et al., 2006 Rickman_Brain: Rickman et al., 2001 Sanchez-Carbayo_Bladder_2: Sanchez-Carbayo et al., 2006 Shai_Brain: Shai et al., 2003 Skotheim_Multi-cancer: Skotheim et al., 2005 Sperger_Others: Sperger et al., 2003 Sun_Brain: Sun et al., 2006 Talantov_Melanoma: Talantov et al., 2005 Talbot_Lung: Talbot et al., 2005 50 Tomlins_Prostate: Tomlins et al., 2007 Toruner_Head-Neck: Toruner et al., 2004 Varambally_Prostate: Varambally et al., 2005 Wachi_Lung: Wachi et al., 2005 Wang_Esophagus: Wang et al., 2006 Welsh_Ovarian: Welsh et al., 2001a Welsh_Prostate: Welsh et al., 2001b Yu_Prostate: Yu et al., 2004 51 2. Normal vs. cancer, LGALS1 nonlymphoid/noncolon a. Expression differences, p-value threshold of 1E-2 Figure A2.1a. LGALS1 expression in nonlymphoid, noncolon malignancies (higher expression in normal tissue). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 52 Figure A2.1b. LGALS1 expression in nonlymphoid, noncolon malignancies (higher expression in cancer). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 53 b. Individual studies, expression difference significant at 0.9 log-normalized expression units Figure A2.2a. LGALS1expression differences in oligodendroma, normal vs. cancer (Bredel et al., 2005), from Oncomine (p = 7.40E-04). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 54 Figure A2.2b. LGALS1 expression differences in head/neck, normal vs. cancer (Chung et al., 2004), from Oncomine (p = 3.1E-10). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 55 Figure A2.2c. LGALS1 expression differences in kidney, normal vs. cancer (Lenburg et al., 2003), from Oncomine (p = 4.1E-5). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 56 Figure A2.2d. LGALS1 expression differences in endometrium, normal vs. cancer (Mutter et al., 2001), from Oncomine (p = 6.1E-4). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 57 Figure A2.2c. LGALS1 expression differences in melanoma, normal vs. cancer (Talantov et al., 2005), from Oncomine (p = 2.4E-10). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 58 Figure A2.2d. LGALS1 expression differences in head-neck, normal vs. cancer (Toruner et al., 2004), from Oncomine (p = 3.0E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 59 3. Normal vs. cancer, LGALS2 nonlymphoid/noncolon a. Expression differences, p-value threshold of 1E-2 Figure A3.1a. LGALS2 expression in nonlymphoid, noncolon malignancies (higher expression in normal). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 60 Figure A3.1b. LGALS2 expression in nonlymphoid, noncolon malignancies (higher expression in cancer). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 61 b. Individual studies, expression difference significant at 0.9 log-normalized expression units Figure A3.2a. LGALSx expression differences in pancreatic ductal carcinoma, normal vs. cancer (Ishikawa et al., 2005), from Oncomine (p = 3.0E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 62 4. Normal vs. cancer, LGALS3 nonlymphoid/noncolon a. Expression differences, p-value threshold of 1E-2 Figure A4.1a. LGALS3 expression in nonlymphoid, noncolon malignancies (higher expression in normal). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 63 Figure A4.1b. LGALS3 expression in nonlymphoid, noncolon malignancies (higher expression in cancer). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 64 b. Individual studies, expression difference significant at 0.9 log-normalized expression units Figure A4.2a. LGALS3 expression differences in lung, normal vs. cancer (Bhattacharjee et al., 2001), from Oncomine (p = 3.0E-10). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 65 Figure A4.2b. LGALS3 expression differences in glioblastoma, normal vs. cancer (Bredel et al., 2005), from Oncomine (p = 3.8E-8). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 66 Figure A4.2c. LGALS3 expression differences in small cell lung cancer, normal vs. cancer (Garber et al., 2001), from Oncomine (p = 2.0E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 67 Figure A4.2d. LGALS3 expression differences in melanoma, normal vs. cancer (Haqq et al., 2005), from Oncomine (p = 2.3E-5). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 68 Figure A4.2e. LGALS3 expression differences in pancreatic adenocarcinoma, normal vs. cancer (Iacobuzio-Donahue et al., 2003), from Oncomine (p = 3.6E-6). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 69 Figure A4.2f. LGALS3 expression differences in prostate, normal vs. cancer (Lapointe et al., 2004), from Oncomine (p = 6.3E-10). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 70 Figure A4.2g. LGALS3 expression differences in prostate, normal vs. cancer (Luo et al., 2001), from Oncomine (p = 1.0E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 71 5. Normal vs. cancer, LGALS4 nonlymphoid/noncolon a. Expression differences, p-value threshold of 1E-2 Figure A5.1a. LGALS4 expression in nonlymphoid, noncolon malignancies (higher expression in normal). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 72 Figure A5.1b. LGALS4 expression in nonlymphoid, noncolon malignancies (higher expression in cancer). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 73 b. Individual studies, expression difference significant at 0.9 log-normalized expression units Figure A5.2a. LGALS4 expression differences in liver, normal vs. cancer (Chen et al., 2002), from Oncomine (p = 3.00E-5). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 74 Figure A5.2b. LGALS4 expression differences in lung adenocarcinoma, normal vs. cancer (Garber et al., 2001), from Oncomine (p = 6.00E-03). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 75 Figure A5.2c. LGALSx expression differences in lung squamous cell carcinoma, normal vs. cancer (Garber et al., 2001), from Oncomine (p = 2.00E-03). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 76 Figure A5.2d. LGALS4 expression differences in ovarian, normal vs. cancer (Hendrix et al., 2006), from Oncomine (p = 4.00E-04). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 77 Figure A5.2e. LGALS4 expression differences in pancreatic adenocarcinoma, normal vs. cancer (Iacobuzio-Donahue et al., 2003), from Oncomine (p = 1.30E-04). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 78 Figure A5.2f. LGALS4 expression differences in glioblastoma multiforme, normal vs. cancer (Liang et al., 2005), from Oncomine (p = 2.00E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 79 Figure A5.2g. LGALS4 expression differences in esophagus, normal vs. cancer (Wang et al., 2006), from Oncomine (p = 5.00E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 80 6. Normal vs. cancer, LGALS7 nonlymphoid/noncolon a. Expression differences, p-value threshold of 1E-2 Figure A6.1a. LGALS7 expression in nonlymphoid, noncolon malignancies (higher expression in normal). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 81 Figure A6.1b. LGALS7 expression in nonlymphoid, noncolon malignancies (higher expression in cancer). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 82 b. Individual studies, expression difference significant at 0.9 log-normalized expression units Figure A6.2a. LGALS7 expression differences in lung squamous cell carcinoma, normal vs. cancer (Garber et al., 2001), from Oncomine (p = 5.00E-03). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 83 Figure A6.2b. LGALS7 expression differences in melanoma, normal vs. cancer (Haqq et al., 2005), from Oncomine (p = 7.80E-06). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 84 Figure A6.2c. LGALS7 expression differences in melanoma, normal vs. cancer (Talantov et al., 2005), from Oncomine (p = 1.10E-19). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 85 Figure A6.2d. LGALS7 expression differences in tongue squamous cell carcinoma, normal vs. cancer (Talbot et al., 2005), from Oncomine (p = 2.80E-08). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. Note that this tongue tissue data was included as part of a lung study. 86 Figure A6.2e. LGALS7 expression differences in head-neck, normal vs. cancer (Toruner et al., 2004), from Oncomine (p = 5.00E-03). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 87 Figure A6.2f. LGALS7 expression differences in prostate, normal vs. cancer (Varambally et al., 2005), from Oncomine (p = 3.00E-03). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 88 Figure A6.2g. LGALS7 expression differences in lung squamous cell carcinoma, normal vs. cancer (Wachi et al., 2005), from Oncomine (p = 1.00E-02). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 89 Figure A6.2h. LGALS7 expression differences in esophagus, normal vs. cancer (Wang et al., 2006), from Oncomine (p = 2.30E-09). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 90 7. Normal vs. cancer, LGALS8 nonlymphoid/noncolon a. Expression differences, p-value threshold of 1E-2 Figure A7.1a. LGALS8 expression in nonlymphoid, noncolon malignancies (higher expression in normal). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 91 Figure A7.1b. LGALS8 expression in nonlymphoid, noncolon malignancies (higher expression in cancer). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 92 b. Individual studies, expression difference significant at 0.9 log-normalized expression units Figure A7.2a. LGALS8 expression differences in bladder, normal vs. cancer (Dyrskjøt et al., 2004), from Oncomine (p = 4.1E-9). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 93 Figure A7.2b. LGALS8 expression differences in mesothelioma, normal vs. cancer (Gordon et al., 2005), from Oncomine (p = 8.0E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 94 Figure A7.2c. LGALS8 expression differences in ovarian, normal vs. cancer (Hendrix et al., 2006), from Oncomine (p = 1.2E-10). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 95 Figure A7.2d. LGALS8 expression differences in testis, normal vs. cancer (Korkola et al., 2006), from Oncomine (p = 7.6E-9). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 96 Figure A7.2e. LGALS8 expression differences in testis, normal vs. cancer (Skotheim et al., 2005), from Oncomine (p = 4.0E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 97 Figure A7.2f. LGALS8 expression differences in head-neck, normal vs. cancer (Toruner et al., 2004), from Oncomine (p = 3E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 98 Figure A7.2g. LGALS8 expression differences in lung squamous cell carcinoma, normal vs. cancer (Wachi et al., 2005), from Oncomine (p = 6.4E-4). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 99 8. Normal vs. cancer, LGALS9 nonlymphoid/noncolon a. Expression differences, p-value threshold of 1E-2 Figure A8.1a. LGALS9 expression in nonlymphoid, noncolon malignancies (higher expression in normal). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 100 Figure A8.1b. LGALS9 expression in nonlymphoid, noncolon malignancies (higher expression in cancer). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 101 b. Individual studies, expression difference significant at 0.9 log-normalized expression units Figure A8.2a. LGALS9 expression differences in glioblastoma, normal vs. cancer (Bredel et al., 2005), from Oncomine (p = 5.9E-4). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 102 Figure A8.2b. LGALS9 expression differences in pancreatic adenocarcinoma, normal vs. cancer (Iacobuzio-Donahue et al., 2003), from Oncomine (p = 1.9E-7). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 103 Figure A8.2c. LGALSx expression differences in testis, normal vs. cancer (Sperger et al., 2003), from Oncomine (p = 3.3E-4). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 104 9. Normal vs. cancer, CLC/LGALS10 nonlymphoid/noncolon a. Expression differences, p-value threshold of 1E-2 Figure A9.1a. LGALS10 expression in nonlymphoid, noncolon malignancies (higher expression in normal). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 105 Figure 9.1b. LGALS10 expression in nonlymphoid, noncolon malignancies (higher expression in cancer). For each paired Oncomine dataset, median expression in normal tissue is shown by the left of the paired bars, and median expression in cancerous tissue is shown by the right of the paired bars. Data is organized in descending order based on median expression in cancer. 106 b. Individual studies, expression difference significant at 0.9 log-normalized expression units Figure A9.2a. LGALS10 expression differences in salivary, normal vs. cancer (Frierson et al., 2002), from Oncomine (p = 2.8E-4). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 107 Figure A9.2b. LGALS10 expression differences in lung adenocarcinoma, normal vs. cancer (Garber et al., 2001), from Oncomine (p = 3.00E-13). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 108 Figure A9.2c. LGALS10 expression differences in lung squamous cell carcinoma, normal vs. cancer (Garber et al., 2001), from Oncomine (p = 2.10E-07). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 109 Figure A9.2d. LGALS10 expression differences in glioma, normal vs. cancer (Rickman et al., 2001), from Oncomine (p = 4E-3). Thick bars indicate median expression value; error bars show 10th and 90th percentiles. 110 APPENDIX B Association analysis, genotype frequencies For each SNP, genotype frequencies were compared between datasets archived at the Gene Expression Omnibus (noted here by GEO accession number and associated research article, eg. GSE16019: Chen M et al. (2009)) and both individual and aggregated HapMap Phase 1 populations as controls, with Williams’ corrected G-test for independence used to determine association. Statistical relevance of individual association analyses is shown in each table below. 111 1. rs428007 (CLC/LGALS10) a. GSE16019: Chen M et al. (2009) Genotype frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 T/T: 18/77 = 0.234 T/C: 30/77 = 0.390 C/C: 29/77 = 0.377 HapMap Phase 1 (aggr.) T/T: 79/395 = 0.200 T/C: 166/395 = 0.420 C/C: 150/395 = 0.380 GSE16019 vs. HapMap aggr. 18 30 29 79 166 150 18 30 29 11 46 55 18 30 29 32 42 10 18 30 29 33 44 9 1 15.138 5.162E-04 17.487 1.595E-04 GSE16019 vs. HapMap YRI HapMap YRI T/T: 3/113 = 0.027 T/C: 34/113 = 0.301 C/C: 76/113 = 0.673 3.567E-02 GSE16019 vs. HapMap JPT HapMap JPT T/T: 33/86 = 0.384 T/C: 44/86 = 0.512 C/C: 9/86 = 0.105 6.667 GSE16019 vs. HapMap CHB HapMap CHB T/T: 32/84 = 0.381 T/C: 42/84 = 0.500 C/C: 10/84 = 0.119 7.827E-01 GSE16019 vs. HapMap CEU HapMap CEU T/T: 11/112 = 0.098 T/C: 46/112 = 0.411 C/C: 55/112 = 0.491 0.49 18 30 29 G and p-values are Williams’-corrected 3 34 76 26.608 1.668E-06 112 b. GSE13282: Gordan et al. (2008) Genotype frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 T/T: 1/21 = 0.048 T/C: 12/21 = 0.571 C/C: 8/21 = 0..381 HapMap Phase 1 (aggr.) T/T: 79/395 = 0.200 T/C: 166/395 = 0.420 C/C: 150/395 = 0.380 GSE13282 vs. HapMap aggr. 1 12 8 79 166 150 1 12 8 11 46 55 1 12 8 32 42 10 1 12 8 33 44 9 1 13.618 1.104E-03 14.615 6.705E-04 GSE13282 vs. HapMap YRI HapMap YRI T/T: 3/113 = 0.027 T/C: 34/113 = 0.301 C/C: 76/113 = 0.673 3.819E-01 GSE13282 vs. HapMap JPT HapMap JPT T/T: 33/86 = 0.384 T/C: 44/86 = 0.512 C/C: 9/86 = 0.105 1.925 GSE13282 vs. HapMap CHB HapMap CHB T/T: 32/84 = 0.381 T/C: 42/84 = 0.500 C/C: 10/84 = 0.119 1.213E-01 GSE13282 vs. HapMap CEU HapMap CEU T/T: 11/112 = 0.098 T/C: 46/112 = 0.411 C/C: 55/112 = 0.491 4.219 1 12 8 G and p-values are Williams’-corrected 3 34 76 5.406 6.700E-02 113 c. GSE19189: Letouze et al. (2010) Genotype frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 T/T: 1/20 = 0.050 T/C: 11/20 = 0.550 C/C: 8/20 = 0.400 HapMap Phase 1 (aggr.) T/T: 79/395 = 0.200 T/C: 166/395 = 0.420 C/C: 150/395 = 0.380 GSE19189 vs. HapMap aggr. 1 11 8 79 166 150 1 11 8 11 46 55 1 11 8 32 42 10 1 11 8 33 44 9 1 13.422 1.217E-03 14.451 7.278E-04 GSE19189 vs. HapMap YRI HapMap YRI T/T: 3/113 = 0.027 T/C: 34/113 = 0.301 C/C: 76/113 = 0.673 4.875E-01 GSE19189 vs. HapMap JPT HapMap JPT T/T: 33/86 = 0.384 T/C: 44/86 = 0.512 C/C: 9/86 = 0.105 1.437 GSE19189 vs. HapMap CHB HapMap CHB T/T: 32/84 = 0.381 T/C: 42/84 = 0.500 C/C: 10/84 = 0.119 1.582E-01 GSE19189 vs. HapMap CEU HapMap CEU T/T: 11/112 = 0.098 T/C: 46/112 = 0.411 C/C: 55/112 = 0.491 3.688 1 11 8 G and p-values are Williams’-corrected 3 34 76 4.516 1.046E-01 114 d. GSE10506: Nancarrow et al. (2008) Genotype frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 T/T: 2/20 = 0.100 T/C: 11/20 = 0.550 C/C: 7/20 = 0.350 HapMap Phase 1 (aggr.) T/T: 79/395 = 0.200 T/C: 166/395 = 0.420 C/C: 150/395 = 0.380 GSE10506 vs. HapMap aggr. 2 11 7 79 166 150 2 11 7 11 46 55 2 11 7 32 42 10 2 11 7 33 44 9 1 9.046 1.086E-02 9.909 7.052E-03 GSE10506 vs. HapMap YRI HapMap YRI T/T: 3/113 = 0.027 T/C: 34/113 = 0.301 C/C: 76/113 = 0.673 4.946E-01 GSE10506 vs. HapMap JPT HapMap JPT T/T: 33/86 = 0.384 T/C: 44/86 = 0.512 C/C: 9/86 = 0.105 1.408 GSE10506 vs. HapMap CHB HapMap CHB T/T: 32/84 = 0.381 T/C: 42/84 = 0.500 C/C: 10/84 = 0.119 3.975E-01 GSE10506 vs. HapMap CEU HapMap CEU T/T: 11/112 = 0.098 T/C: 46/112 = 0.411 C/C: 55/112 = 0.491 1.845 2 11 7 G and p-values are Williams’-corrected 3 34 76 6.924 3.137E-02 115 e. GSE18799: Popova et al. (2009) Genotype frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 T/T: 3/21 = 0.143 T/C: 9/21 = 0.429 C/C: 9/21 = 0.429 HapMap Phase 1 (aggr.) T/T: 79/395 = 0.200 T/C: 166/395 = 0.420 C/C: 150/395 = 0.380 GSE18799 vs. HapMap aggr. 3 9 9 79 166 150 3 9 9 11 46 55 3 9 9 32 42 10 3 9 9 33 44 9 1 10.376 5.583E-03 11.591 3.041E-03 GSE18799 vs. HapMap YRI HapMap YRI T/T: 3/113 = 0.027 T/C: 34/113 = 0.301 C/C: 76/113 = 0.673 8.005E-01 GSE18799 vs. HapMap JPT HapMap JPT T/T: 33/86 = 0.384 T/C: 44/86 = 0.512 C/C: 9/86 = 0.105 0.445 GSE18799 vs. HapMap CHB HapMap CHB T/T: 32/84 = 0.381 T/C: 42/84 = 0.500 C/C: 10/84 = 0.119 7.906E-01 GSE18799 vs. HapMap CEU HapMap CEU T/T: 11/112 = 0.098 T/C: 46/112 = 0.411 C/C: 55/112 = 0.491 0.47 3 9 9 G and p-values are Williams’-corrected 3 34 76 5.87 5.313E-02 116 f. GSE9003: Stark & Hayward (2007) Genotype frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 T/T: 7/73 = 0.096 T/C: 25/73 = 0.342 C/C: 41/73 = 0.562 HapMap Phase 1 (aggr.) T/T: 79/395 = 0.200 T/C: 166/395 = 0.420 C/C: 150/395 = 0.380 GSE9003 vs. HapMap aggr. 7 25 41 79 166 150 7 25 41 11 46 55 7 25 41 32 42 10 7 25 41 33 44 9 1 40.617 1.514E-09 44.172 2.560E-10 GSE9003 vs. HapMap YRI HapMap YRI T/T: 3/113 = 0.027 T/C: 34/113 = 0.301 C/C: 76/113 = 0.673 6.225E-01 GSE9003 vs. HapMap JPT HapMap JPT T/T: 33/86 = 0.384 T/C: 44/86 = 0.512 C/C: 9/86 = 0.105 0.948 GSE9003 vs. HapMap CHB HapMap CHB T/T: 32/84 = 0.381 T/C: 42/84 = 0.500 C/C: 10/84 = 0.119 7.860E-03 GSE9003 vs. HapMap CEU HapMap CEU T/T: 11/112 = 0.098 T/C: 46/112 = 0.411 C/C: 55/112 = 0.491 9.692 7 25 41 G and p-values are Williams’-corrected 3 34 76 4.832 8.928E-02 117 g. GSE19177: Waddell et al. (2010) Genotype frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 T/T: 6/31 = 0.164 T/C: 11/31 = 0.355 C/C: 14/31 = 0.452 HapMap Phase 1 (aggr.) T/T: 79/395 = 0.200 T/C: 166/395 = 0.420 C/C: 150/395 = 0.380 GSE19177 vs. HapMap aggr. 6 11 14 79 166 150 6 11 14 11 46 55 6 11 14 32 42 10 6 11 14 33 44 9 1 13.8 1.008E-03 15.551 4.199E-04 GSE19177 vs. HapMap YRI HapMap YRI T/T: 3/113 = 0.027 T/C: 34/113 = 0.301 C/C: 76/113 = 0.673 3.948E-01 GSE19177 vs. HapMap JPT HapMap JPT T/T: 33/86 = 0.384 T/C: 44/86 = 0.512 C/C: 9/86 = 0.105 1.859 GSE19177 vs. HapMap CHB HapMap CHB T/T: 32/84 = 0.381 T/C: 42/84 = 0.500 C/C: 10/84 = 0.119 7.164E-01 GSE19177 vs. HapMap CEU HapMap CEU T/T: 11/112 = 0.098 T/C: 46/112 = 0.411 C/C: 55/112 = 0.491 0.667 6 11 14 G and p-values are Williams’-corrected 3 34 76 10.127 6.323E-03 118 2. rs929039 (LGALS1) a. GSE16019: Chen M et al. (2009) Genotype frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 T/T: 34/75 = 0.453 T/C: 34/75 = 0.453 C/C: 7/75 = 0.093 HapMap Phase 1 (aggr.) T/T: 205/396 = 0.518 C/T: 165/396 = 0.417 C/C: 26/396 = 0.066 GSE16019 vs. HapMap aggr. 34 34 7 205 165 26 34 34 7 51 50 12 34 34 7 49 34 1 34 34 7 39 41 6 1 7.022 2.987E-02 0.314 8.547E-01 GSE16019 vs. HapMap YRI HapMap YRI T/T: 66/113 = 0.584 C/T: 40/113 = 0.354 C/C: 7/113 = 0.062 9.584E-01 GSE16019 vs. HapMap JPT HapMap JPT T/T: 39/86 = 0.453 C/T: 41/86 = 0.477 C/C: 6/86 = 0.070 0.085 GSE16019 vs. HapMap CHB HapMap CHB T/T: 49/84 = 0.583 C/T: 34/84 = 0.405 C/C: 1/84 = 0.002 5.125E-01 GSE16019 vs. HapMap CEU HapMap CEU T/T: 51/113 = 0.451 C/T: 50/113 = 0.442 C/C: 12/113 = 0.106 1.337 34 34 7 G and p-values are Williams’-corrected 66 40 7 3.102 2.120E-01 119 b. GSE13282: Gordan et al. (2008) Genotype frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 T/T: 6/18 = 0.333 T/C: 10/18 = 0.556 C/C: 2/18 = 0.111 HapMap Phase 1 (aggr.) T/T: 205/396 = 0.518 C/T: 165/396 = 0.417 C/C: 26/396 = 0.066 GSE13282 vs. HapMap aggr. 6 10 2 205 165 26 6 10 2 51 50 12 6 10 2 49 34 1 6 10 2 39 41 6 1 5.238 7.288E-02 0.939 6.253E-01 GSE13282 vs. HapMap YRI HapMap YRI T/T: 66/113 = 0.584 C/T: 40/113 = 0.354 C/C: 7/113 = 0.062 6.395E-01 GSE13282 vs. HapMap JPT HapMap JPT T/T: 39/86 = 0.453 C/T: 41/86 = 0.477 C/C: 6/86 = 0.070 0.894 GSE13282 vs. HapMap CHB HapMap CHB T/T: 49/84 = 0.583 C/T: 34/84 = 0.405 C/C: 1/84 = 0.002 3.211E-01 GSE13282 vs. HapMap CEU HapMap CEU T/T: 51/113 = 0.451 C/T: 50/113 = 0.442 C/C: 12/113 = 0.106 2.272 6 10 2 G and p-values are Williams’-corrected 66 40 7 3.667 1.599E-01 120 c. GSE19189: Letouze et al. (2010) Genotype frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 T/T: 7/19 = 0.368 T/C: 10/19 = 0.526 C/C: 2/19 = 0.105 HapMap Phase 1 (aggr.) T/T: 205/396 = 0.518 C/T: 165/396 = 0.417 C/C: 26/396 = 0.066 GSE19189 vs. HapMap aggr. 7 10 2 205 165 26 7 10 2 51 50 12 7 10 2 49 34 1 7 10 2 39 41 6 1 4.53 1.038E-01 0.541 7.630E-01 GSE19189 vs. HapMap YRI HapMap YRI T/T: 66/113 = 0.584 C/T: 40/113 = 0.354 C/C: 7/113 = 0.062 7.866E-01 GSE19189 vs. HapMap JPT HapMap JPT T/T: 39/86 = 0.453 C/T: 41/86 = 0.477 C/C: 6/86 = 0.070 0.48 GSE19189 vs. HapMap CHB HapMap CHB T/T: 49/84 = 0.583 C/T: 34/84 = 0.405 C/C: 1/84 = 0.002 4.520E-01 GSE19189 vs. HapMap CEU HapMap CEU T/T: 51/113 = 0.451 C/T: 50/113 = 0.442 C/C: 12/113 = 0.106 1.588 7 10 2 G and p-values are Williams’-corrected 66 40 7 2.845 2.411E-01 121 d. GSE10506: Nancarrow et al. (2008) Genotype frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 T/T: 13/20 = 0.65 T/C: 4/20 = 0.20 C/C: 3/20 = 0.15 HapMap Phase 1 (aggr.) T/T: 205/396 = 0.518 C/T: 165/396 = 0.417 C/C: 26/396 = 0.066 GSE10506 vs. HapMap aggr. 13 4 3 205 165 26 13 4 3 51 50 12 13 4 3 49 34 1 13 4 3 39 41 6 1 7.159 2.789E-02 5.391 6.751E-02 GSE10506 vs. HapMap YRI HapMap YRI T/T: 66/113 = 0.584 C/T: 40/113 = 0.354 C/C: 7/113 = 0.062 1.191E-01 GSE10506 vs. HapMap JPT HapMap JPT T/T: 39/86 = 0.453 C/T: 41/86 = 0.477 C/C: 6/86 = 0.070 4.256 GSE10506 vs. HapMap CHB HapMap CHB T/T: 49/84 = 0.583 C/T: 34/84 = 0.405 C/C: 1/84 = 0.002 1.090E-01 GSE10506 vs. HapMap CEU HapMap CEU T/T: 51/113 = 0.451 C/T: 50/113 = 0.442 C/C: 12/113 = 0.106 4.432 13 4 3 G and p-values are Williams’-corrected 66 40 7 2.74 2.541E-01 122 e. GSE18799: Popova et al. (2009) Genotype frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 T/T: 7/19 = 0.369 T/C: 8/19 = 0.421 C/C: 4/19 = 0.211 HapMap Phase 1 (aggr.) T/T: 205/396 = 0.518 C/T: 165/396 = 0.417 C/C: 26/396 = 0.066 GSE18799 vs. HapMap aggr. 7 8 4 205 165 26 7 8 4 51 50 12 7 8 4 49 34 1 7 8 4 39 41 6 1 9.363 9.265E-03 2.808 2.456E-01 GSE18799 vs. HapMap YRI HapMap YRI T/T: 66/113 = 0.584 C/T: 40/113 = 0.354 C/C: 7/113 = 0.062 4.846E-01 GSE18799 vs. HapMap JPT HapMap JPT T/T: 39/86 = 0.453 C/T: 41/86 = 0.477 C/C: 6/86 = 0.070 1.449 GSE18799 vs. HapMap CHB HapMap CHB T/T: 49/84 = 0.583 C/T: 34/84 = 0.405 C/C: 1/84 = 0.002 1.282E-01 GSE18799 vs. HapMap CEU HapMap CEU T/T: 51/113 = 0.451 C/T: 50/113 = 0.442 C/C: 12/113 = 0.106 4.109 7 8 4 G and p-values are Williams’-corrected 66 40 7 4.652 9.769E-02 123 f. GSE9003: Stark & Hayward (2007) Genotype frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 T/T: 30/64 = 0.469 T/C: 28/64 = 0.438 C/C: 6/64 = 0.094 HapMap Phase 1 (aggr.) T/T: 205/396 = 0.518 C/T: 165/396 = 0.417 C/C: 26/396 = 0.066 GSE9003 vs. HapMap aggr. 30 28 6 205 165 26 30 28 6 51 50 12 30 28 6 49 34 1 30 28 6 39 41 6 1 6.187 4.534E-02 0.392 8.220E-01 GSE9003 vs. HapMap YRI HapMap YRI T/T: 66/113 = 0.584 C/T: 40/113 = 0.354 C/C: 7/113 = 0.062 9.560E-01 GSE9003 vs. HapMap JPT HapMap JPT T/T: 39/86 = 0.453 C/T: 41/86 = 0.477 C/C: 6/86 = 0.070 0.09 GSE9003 vs. HapMap CHB HapMap CHB T/T: 49/84 = 0.583 C/T: 34/84 = 0.405 C/C: 1/84 = 0.002 6.463E-01 GSE9003 vs. HapMap CEU HapMap CEU T/T: 51/113 = 0.451 C/T: 50/113 = 0.442 C/C: 12/113 = 0.106 0.873 30 28 6 G and p-values are Williams’-corrected 66 40 7 2.238 3.266E-01 124 g. GSE19177: Waddell et al. (2010) Genotype frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 T/T: 13/32 = 0.406 T/C: 11/32 = 0.344 C/C: 8/32 = 0.25 HapMap Phase 1 (aggr.) T/T: 205/396 = 0.518 C/T: 165/396 = 0.417 C/C: 26/396 = 0.066 GSE19177 vs. HapMap aggr. 13 11 8 205 165 26 13 11 8 51 50 12 13 11 8 49 34 1 13 11 8 39 41 6 1 15.892 3.541E-04 6.439 3.998E-02 GSE19177 vs. HapMap YRI HapMap YRI T/T: 66/113 = 0.584 C/T: 40/113 = 0.354 C/C: 7/113 = 0.062 1.461E-01 GSE19177 vs. HapMap JPT HapMap JPT T/T: 39/86 = 0.453 C/T: 41/86 = 0.477 C/C: 6/86 = 0.070 3.847 GSE19177 vs. HapMap CHB HapMap CHB T/T: 49/84 = 0.583 C/T: 34/84 = 0.405 C/C: 1/84 = 0.002 9.932E-03 GSE19177 vs. HapMap CEU HapMap CEU T/T: 51/113 = 0.451 C/T: 50/113 = 0.442 C/C: 12/113 = 0.106 9.224 13 11 8 G and p-values are Williams’-corrected 66 40 7 8.194 1.662E-02 125 3. rs2235338 (LGALS2) a. GSE21168: Castillo et al. (2010) Genotype frequencies Castillo, GSE21168 GSE21168 HapMap G1 p-value1 G/G: 2/4 = 0.500 A/G: 1/4 = 0.250 A/A: 1/4 = 0.250 HapMap Phase 1 (aggr.) G/G: 127/396 = 0.321 A/G: 186/396 = 0.470 A/A: 83/396 = 0.210 GSE21168 vs. HapMap aggr. 2 1 1 127 186 83 2 1 1 12 60 41 2 1 1 41 36 7 2 1 1 23 39 24 1 0.878 6.447E-01 0.589 7.449E-01 GSE21168 vs. HapMap YRI HapMap YRI G/G: 51/113 = 0.451 A/G: 51/113 = 0.451 A/A: 11/113 = 0.097 2.255E-01 GSE21168 vs. HapMap JPT HapMap JPT G/G: 23/86 = 0.267 A/G: 39/86 = 0.453 A/A: 24/86 = 0.279 2.979 GSE21168 vs. HapMap CHB HapMap CHB G/G: 41/84 = 0.488 A/G: 36/84 = 0.429 A/A: 7/84 = 0.083 6.966E-01 GSE21168 vs. HapMap CEU HapMap CEU G/G: 12/113 = 0.106 A/G: 60/113 = 0.531 A/A: 41/113 = 0.363 0.723 2 1 1 G and p-values are Williams’-corrected 51 51 11 0.838 6.577E-01 126 b. GSE16019: Chen M et al. (2009) Genotype frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 G/G: 13/76 = 0.171 A/G: 40/76 = 0.526 A/A: 23/76 = 0.303 HapMap Phase 1 (aggr.) G/G: 127/396 = 0.321 A/G: 186/396 = 0.470 A/A: 83/396 = 0.210 GSE16019 vs. HapMap aggr. 13 40 23 127 186 83 13 40 23 12 60 41 13 40 23 41 36 7 13 40 23 23 39 24 1 23.705 7.121E-06 2.2 3.329E-01 GSE16019 vs. HapMap YRI HapMap YRI G/G: 51/113 = 0.451 A/G: 51/113 = 0.451 A/A: 11/113 = 0.097 3.910E-01 GSE16019 vs. HapMap JPT HapMap JPT G/G: 23/86 = 0.267 A/G: 39/86 = 0.453 A/A: 24/86 = 0.279 1.878 GSE16019 vs. HapMap CHB HapMap CHB G/G: 41/84 = 0.488 A/G: 36/84 = 0.429 A/A: 7/84 = 0.083 1.721E-02 GSE16019 vs. HapMap CEU HapMap CEU G/G: 12/113 = 0.106 A/G: 60/113 = 0.531 A/A: 41/113 = 0.363 8.124 13 40 23 G and p-values are Williams’-corrected 51 51 11 22.195 1.515E-05 127 c. GSE13282: Gordan et al. (2008) Genotype frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 G/G: 5/20 = 0.25 A/G: 7/20 = 0.35 A/A: 8/20 = 0.4 HapMap Phase 1 (aggr.) G/G: 127/396 = 0.321 A/G: 186/396 = 0.470 A/A: 83/396 = 0.210 GSE13282 vs. HapMap aggr. 5 7 8 127 186 83 5 7 8 12 60 41 5 7 8 41 36 7 5 7 8 23 39 24 1 10.766 4.594E-03 1.131 5.681E-01 GSE13282 vs. HapMap YRI HapMap YRI G/G: 51/113 = 0.451 A/G: 51/113 = 0.451 A/A: 11/113 = 0.097 1.853E-01 GSE13282 vs. HapMap JPT HapMap JPT G/G: 23/86 = 0.267 A/G: 39/86 = 0.453 A/A: 24/86 = 0.279 3.372 GSE13282 vs. HapMap CHB HapMap CHB G/G: 41/84 = 0.488 A/G: 36/84 = 0.429 A/A: 7/84 = 0.083 1.830E-01 GSE13282 vs. HapMap CEU HapMap CEU G/G: 12/113 = 0.106 A/G: 60/113 = 0.531 A/A: 41/113 = 0.363 3.397 5 7 8 G and p-values are Williams’-corrected 51 51 11 9.878 7.162E-03 128 d. GSE19189: Letouze et al. (2010) Genotype frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 G/G: 3/19 = 0.159 A/G: 8/19 = 0.421 A/A: 8/19 = 0.421 HapMap Phase 1 (aggr.) G/G: 127/396 = 0.321 A/G: 186/396 = 0.470 A/A: 83/396 = 0.210 GSE19189 vs. HapMap aggr. 3 8 8 127 186 83 3 8 8 12 60 41 3 8 8 41 36 7 3 8 8 23 39 24 1 13.482 1.181E-03 1.756 4.156E-01 GSE19189 vs. HapMap YRI HapMap YRI G/G: 51/113 = 0.451 A/G: 51/113 = 0.451 A/A: 11/113 = 0.097 6.548E-01 GSE19189 vs. HapMap JPT HapMap JPT G/G: 23/86 = 0.267 A/G: 39/86 = 0.453 A/A: 24/86 = 0.279 0.847 GSE19189 vs. HapMap CHB HapMap CHB G/G: 41/84 = 0.488 A/G: 36/84 = 0.429 A/A: 7/84 = 0.083 9.456E-02 GSE19189 vs. HapMap CEU HapMap CEU G/G: 12/113 = 0.106 A/G: 60/113 = 0.531 A/A: 41/113 = 0.363 4.717 3 8 8 G and p-values are Williams’-corrected 51 51 11 12.325 2.107E-03 129 e. GSE10506: Nancarrow et al. (2008) Genotype frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 G/G: 5/20 = 0.25 A/G: 5/20 = 0.25 A/A: 10/20 = 0.5 HapMap Phase 1 (aggr.) G/G: 127/396 = 0.321 A/G: 186/396 = 0.470 A/A: 83/396 = 0.210 GSE10506 vs. HapMap aggr. 5 5 10 127 186 83 5 5 10 12 60 41 5 5 10 41 36 7 5 5 10 23 39 24 1 16.065 3.247E-04 3.904 1.420E-01 GSE10506 vs. HapMap YRI HapMap YRI G/G: 51/113 = 0.451 A/G: 51/113 = 0.451 A/A: 11/113 = 0.097 5.011E-02 GSE10506 vs. HapMap JPT HapMap JPT G/G: 23/86 = 0.267 A/G: 39/86 = 0.453 A/A: 24/86 = 0.279 5.987 GSE10506 vs. HapMap CHB HapMap CHB G/G: 41/84 = 0.488 A/G: 36/84 = 0.429 A/A: 7/84 = 0.083 2.013E-02 GSE10506 vs. HapMap CEU HapMap CEU G/G: 12/113 = 0.106 A/G: 60/113 = 0.531 A/A: 41/113 = 0.363 7.811 5 5 10 G and p-values are Williams’-corrected 51 51 11 15.484 4.342E-04 130 f. GSE18799: Popova et al. (2009) Genotype frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 G/G: 7/19 = 0.368 A/G: 8/19 = 0.421 A/A: 4/19 = 0.210 HapMap Phase 1 (aggr.) G/G: 127/396 = 0.321 A/G: 186/396 = 0.470 A/A: 83/396 = 0.210 GSE18799 vs. HapMap aggr. 7 8 4 127 186 83 7 8 4 12 60 41 7 8 4 41 36 7 7 8 4 23 39 24 1 2.327 3.124E-01 0.815 6.653E-01 GSE18799 vs. HapMap YRI HapMap YRI G/G: 51/113 = 0.451 A/G: 51/113 = 0.451 A/A: 11/113 = 0.097 2.776E-02 GSE18799 vs. HapMap JPT HapMap JPT G/G: 23/86 = 0.267 A/G: 39/86 = 0.453 A/A: 24/86 = 0.279 7.168 GSE18799 vs. HapMap CHB HapMap CHB G/G: 41/84 = 0.488 A/G: 36/84 = 0.429 A/A: 7/84 = 0.083 9.008E-01 GSE18799 vs. HapMap CEU HapMap CEU G/G: 12/113 = 0.106 A/G: 60/113 = 0.531 A/A: 41/113 = 0.363 0.209 7 8 4 G and p-values are Williams’-corrected 51 51 11 1.733 4.204E-01 131 g. GSE9003: Stark & Hayward (2007) Genotype frequencies Stark, GSE9003 GSE9003 G1 HapMap p-value1 G/G: 11/72 = 0.153 A/G: 31/72 = 0.431 A/A: 30/72 = 0.417 HapMap Phase 1 (aggr.) G/G: 127/396 = 0.321 A/G: 186/396 = 0.470 A/A: 83/396 = 0.210 GSE9003 vs. HapMap aggr. 11 31 30 127 186 83 11 31 30 12 60 41 11 31 30 41 36 7 11 31 30 23 39 24 1 32.821 7.465E-08 4.605 1.000E-01 GSE9003 vs. HapMap YRI HapMap YRI G/G: 51/113 = 0.451 A/G: 51/113 = 0.451 A/A: 11/113 = 0.097 3.746E-01 GSE9003 vs. HapMap JPT HapMap JPT G/G: 23/86 = 0.267 A/G: 39/86 = 0.453 A/A: 24/86 = 0.279 1.964 GSE9003 vs. HapMap CHB HapMap CHB G/G: 41/84 = 0.488 A/G: 36/84 = 0.429 A/A: 7/84 = 0.083 3.106E-04 GSE9003 vs. HapMap CEU HapMap CEU G/G: 12/113 = 0.106 A/G: 60/113 = 0.531 A/A: 41/113 = 0.363 16.154 11 31 30 G and p-values are Williams’-corrected 51 51 11 32.49 8.808E-08 132 h. GSE19177: Waddell et al. (2010) Genotype frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 G/G: 6/34 = 0.176 A/G: 19/34 = 0.559 A/A: 9/34 = 0.265 HapMap Phase 1 (aggr.) G/G: 127/396 = 0.321 A/G: 186/396 = 0.470 A/A: 83/396 = 0.210 GSE19177 vs. HapMap aggr. 6 19 9 127 186 83 6 19 9 12 60 41 6 19 9 41 36 7 6 19 9 23 39 24 1 12.597 1.839E-03 1.417 4.924E-01 GSE19177 vs. HapMap YRI HapMap YRI G/G: 51/113 = 0.451 A/G: 51/113 = 0.451 A/A: 11/113 = 0.097 4.202E-01 GSE19177 vs. HapMap JPT HapMap JPT G/G: 23/86 = 0.267 A/G: 39/86 = 0.453 A/A: 24/86 = 0.279 1.734 GSE19177 vs. HapMap CHB HapMap CHB G/G: 41/84 = 0.488 A/G: 36/84 = 0.429 A/A: 7/84 = 0.083 1.919E-01 GSE19177 vs. HapMap CEU HapMap CEU G/G: 12/113 = 0.106 A/G: 60/113 = 0.531 A/A: 41/113 = 0.363 3.302 6 19 9 G and p-values are Williams’-corrected 51 51 11 10.948 4.194E-03 133 4. rs3763959 (LGALS9) a. GSE21168: Castillo et al. (2010) Genotype frequencies Castillo, GSE21168 GSE21168 HapMap G1 p-value1 A/A: 0/4 = 0.000 A/G: 3/4 = 0.750 G/G: 1/4 = 0.250 HapMap Phase 1 (aggr.) A/A: 44/395 = 0.111 A/G: 152/395 = 0.385 G/G: 199/395 = 0.504 GSE21168 vs. HapMap aggr. 0 3 1 44 152 199 0 3 1 23 55 35 0 3 1 10 38 36 0 3 1 8 36 41 1 1.498 4.728E-01 1.559 4.586E-01 GSE21168 vs. HapMap YRI HapMap YRI A/A: 3/13 = 0.027 A/G: 23/113 = 0.204 G/G: 87/113 = 0.770 4.125E-01 GSE21168 vs. HapMap JPT HapMap JPT A/A: 8/85 = 0.094 A/G: 36/85 = 0.424 G/G: 41/85 = 0.482 1.771 GSE21168 vs. HapMap CHB HapMap CHB A/A: 10/84 = 0.119 A/G: 38/84 = 0.452 G/G: 36/84 = 0.429 3.604E-01 GSE21168 vs. HapMap CEU HapMap CEU A/A: 23/113 = 0.204 A/G: 55/113 = 0.487 G/G: 35/113 = 0.310 2.041 0 3 1 G and p-values are Williams’-corrected 3 23 87 2.784 2.486E-01 134 b. GSE16019: Chen M et al. (2009) Genotype frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 A/A: 14/77 = 0.182 A/G: 37/77 = 0.481 G/G: 26/77 = 0.338 HapMap Phase 1 (aggr.) A/A: 44/395 = 0.111 A/G: 152/395 = 0.385 G/G: 199/395 = 0.504 GSE16019 vs. HapMap aggr. 14 37 26 44 152 199 14 37 26 23 55 35 14 37 26 10 38 36 14 37 26 8 36 41 1 1.967 3.740E-01 4.584 1.011E-01 GSE16019 vs. HapMap YRI HapMap YRI A/A: 3/13 = 0.027 A/G: 23/113 = 0.204 G/G: 87/113 = 0.770 8.945E-01 GSE16019 vs. HapMap JPT HapMap JPT A/A: 8/85 = 0.094 A/G: 36/85 = 0.424 G/G: 41/85 = 0.482 0.223 GSE16019 vs. HapMap CHB HapMap CHB A/A: 10/84 = 0.119 A/G: 38/84 = 0.452 G/G: 36/84 = 0.429 2.116E-02 GSE16019 vs. HapMap CEU HapMap CEU A/A: 23/113 = 0.204 A/G: 55/113 = 0.487 G/G: 35/113 = 0.310 7.711 14 37 26 G and p-values are Williams’-corrected 3 23 87 38.117 5.284E-09 135 c. GSE13282: Gordan et al. (2008) Genotype frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 A/A: 8/21 = 0.381 A/G: 6/21 = 0.286 G/G: 7/21 = 0.333 HapMap Phase 1 (aggr.) A/A: 44/395 = 0.111 A/G: 152/395 = 0.385 G/G: 199/395 = 0.504 GSE13282 vs. HapMap aggr. 8 6 7 44 152 199 8 6 7 23 55 35 8 6 7 10 38 36 8 6 7 8 36 41 1 6.806 3.327E-02 8.645 1.327E-02 GSE13282 vs. HapMap YRI HapMap YRI A/A: 3/13 = 0.027 A/G: 23/113 = 0.204 G/G: 87/113 = 0.770 1.527E-01 GSE13282 vs. HapMap JPT HapMap JPT A/A: 8/85 = 0.094 A/G: 36/85 = 0.424 G/G: 41/85 = 0.482 3.758 GSE13282 vs. HapMap CHB HapMap CHB A/A: 10/84 = 0.119 A/G: 38/84 = 0.452 G/G: 36/84 = 0.429 1.035E-02 GSE13282 vs. HapMap CEU HapMap CEU A/A: 23/113 = 0.204 A/G: 55/113 = 0.487 G/G: 35/113 = 0.310 9.141 8 6 7 G and p-values are Williams’-corrected 3 23 87 22.491 1.307E-05 136 d. GSE19189: Letouze et al. (2010) Genotype frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 A/A: 4/19 = 0.211 A/G: 7/19 = 0.368 G/G: 8/19 = 0.421 HapMap Phase 1 (aggr.) A/A: 44/395 = 0.111 A/G: 152/395 = 0.385 G/G: 199/395 = 0.504 GSE19189 vs. HapMap aggr. 4 7 8 44 152 199 4 7 8 23 55 35 4 7 8 10 38 36 4 7 8 8 36 41 1 1.059 5.889E-01 1.701 4.272E-01 GSE19189 vs. HapMap YRI HapMap YRI A/A: 3/13 = 0.027 A/G: 23/113 = 0.204 G/G: 87/113 = 0.770 5.907E-01 GSE19189 vs. HapMap JPT HapMap JPT A/A: 8/85 = 0.094 A/G: 36/85 = 0.424 G/G: 41/85 = 0.482 1.053 GSE19189 vs. HapMap CHB HapMap CHB A/A: 10/84 = 0.119 A/G: 38/84 = 0.452 G/G: 36/84 = 0.429 4.853E-01 GSE19189 vs. HapMap CEU HapMap CEU A/A: 23/113 = 0.204 A/G: 55/113 = 0.487 G/G: 35/113 = 0.310 1.446 4 7 8 G and p-values are Williams’-corrected 3 23 87 10.601 4.989E-03 137 e. GSE10506: Nancarrow et al. (2008) Genotype frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 A/A: 3/16 = 0.188 A/G: 4/16 = 0.25 G/G: 9/16 = 0.563 HapMap Phase 1 (aggr.) A/A: 44/395 = 0.111 A/G: 152/395 = 0.385 G/G: 199/395 = 0.504 GSE10506 vs. HapMap aggr. 3 4 9 44 152 199 3 4 9 23 55 35 3 4 9 10 38 36 3 4 9 8 36 41 1 2.296 3.173E-01 2.101 3.498E-01 GSE10506 vs. HapMap YRI HapMap YRI A/A: 3/13 = 0.027 A/G: 23/113 = 0.204 G/G: 87/113 = 0.770 1.294E-01 GSE10506 vs. HapMap JPT HapMap JPT A/A: 8/85 = 0.094 A/G: 36/85 = 0.424 G/G: 41/85 = 0.482 4.089 GSE10506 vs. HapMap CHB HapMap CHB A/A: 10/84 = 0.119 A/G: 38/84 = 0.452 G/G: 36/84 = 0.429 4.733E-01 GSE10506 vs. HapMap CEU HapMap CEU A/A: 23/113 = 0.204 A/G: 55/113 = 0.487 G/G: 35/113 = 0.310 1.496 3 4 9 G and p-values are Williams’-corrected 3 23 87 5.269 7.175E-02 138 f. GSE18799: Popova et al. (2009) Genotype frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 A/A: 4/18 = 0.222 A/G: 3/18 = 0.167 G/G: 11/18 = 0.611 HapMap Phase 1 (aggr.) A/A: 44/395 = 0.111 A/G: 152/395 = 0.385 G/G: 199/395 = 0.504 GSE18799 vs. HapMap aggr. 4 3 11 44 152 199 4 3 11 23 55 35 4 3 11 10 38 36 4 3 11 8 36 41 1 5.416 6.667E-02 5.061 7.962E-02 GSE18799 vs. HapMap YRI HapMap YRI A/A: 3/13 = 0.027 A/G: 23/113 = 0.204 G/G: 87/113 = 0.770 2.169E-02 GSE18799 vs. HapMap JPT HapMap JPT A/A: 8/85 = 0.094 A/G: 36/85 = 0.424 G/G: 41/85 = 0.482 7.662 GSE18799 vs. HapMap CHB HapMap CHB A/A: 10/84 = 0.119 A/G: 38/84 = 0.452 G/G: 36/84 = 0.429 1.160E-01 GSE18799 vs. HapMap CEU HapMap CEU A/A: 23/113 = 0.204 A/G: 55/113 = 0.487 G/G: 35/113 = 0.310 4.308 4 3 11 G and p-values are Williams’-corrected 3 23 87 7.064 2.925E-02 139 g. GSE9003: Stark & Hayward (2007) Genotype frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 A/A: 20/67 = 0.299 A/G: 22/67 = 0.328 G/G: 25/67 = 0.373 HapMap Phase 1 (aggr.) A/A: 44/395 = 0.111 A/G: 152/395 = 0.385 G/G: 199/395 = 0.504 GSE9003 vs. HapMap aggr. 20 22 25 44 152 199 20 22 25 23 55 35 20 22 25 10 38 36 20 22 25 8 36 41 1 7.676 2.154E-02 10.345 5.670E-03 GSE9003 vs. HapMap YRI HapMap YRI A/A: 3/13 = 0.027 A/G: 23/113 = 0.204 G/G: 87/113 = 0.770 1.028E-01 GSE9003 vs. HapMap JPT HapMap JPT A/A: 8/85 = 0.094 A/G: 36/85 = 0.424 G/G: 41/85 = 0.482 4.549 GSE9003 vs. HapMap CHB HapMap CHB A/A: 10/84 = 0.119 A/G: 38/84 = 0.452 G/G: 36/84 = 0.429 9.142E-04 GSE9003 vs. HapMap CEU HapMap CEU A/A: 23/113 = 0.204 A/G: 55/113 = 0.487 G/G: 35/113 = 0.310 13.995 20 22 25 G and p-values are Williams’-corrected 3 23 87 37.827 6.109E-09 140 h. GSE19177: Waddell et al. (2010) Genotype frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 A/A: 10/31 = 0.323 A/G: 6/31 = 0.194 G/G: 15/31 = 0.484 HapMap Phase 1 (aggr.) A/A: 44/395 = 0.111 A/G: 152/395 = 0.385 G/G: 199/395 = 0.504 GSE19177 vs. HapMap aggr. 10 6 15 44 152 199 10 6 15 23 55 35 10 6 15 10 38 36 10 6 15 8 36 41 1 9.22 9.952E-03 10.104 6.397E-03 GSE19177 vs. HapMap YRI HapMap YRI A/A: 3/13 = 0.027 A/G: 23/113 = 0.204 G/G: 87/113 = 0.770 1.111E-02 GSE19177 vs. HapMap JPT HapMap JPT A/A: 8/85 = 0.094 A/G: 36/85 = 0.424 G/G: 41/85 = 0.482 8.999 GSE19177 vs. HapMap CHB HapMap CHB A/A: 10/84 = 0.119 A/G: 38/84 = 0.452 G/G: 36/84 = 0.429 5.517E-03 GSE19177 vs. HapMap CEU HapMap CEU A/A: 23/113 = 0.204 A/G: 55/113 = 0.487 G/G: 35/113 = 0.310 10.4 10 6 15 G and p-values are Williams’-corrected 3 23 87 20.261 3.985E-05 141 5. rs4820294 (LGALS1) a. GSE21168: Castillo et al. (2010) Genotype frequencies Castillo, GSE21168 GSE21168 HapMap G1 p-value1 G/G: 0/3 = 0.000 A/G: 1/3 = 0.333 A/A: 2/3 = 0.667 HapMap Phase 1 (aggr.) G/G: 203/392 = 0.518 A/G: 163/392 = 0.416 A/A: 26/392 = 0.066 GSE21168 vs. HapMap aggr. 0 1 2 203 163 26 0 1 2 50 50 12 0 1 2 49 34 1 0 1 2 38 40 6 1 6.956 3.087E-02 5.51 6.361E-02 GSE21168 vs. HapMap YRI HapMap YRI G/G: 66/112 = 0.589 A/G: 39/112 = 0.348 A/A: 7/112 = 0.062 8.729E-02 GSE21168 vs. HapMap JPT HapMap JPT G/G: 38/84 = 0.452 A/G: 40/84 = 0.476 A/A: 6/84 = 0.071 4.877 GSE21168 vs. HapMap CHB HapMap CHB G/G: 49/84 = 0.583 A/G: 34/84 = 0.405 A/A: 1/84 = 0.012 5.426E-02 GSE21168 vs. HapMap CEU HapMap CEU G/G: 50/112 = 0.446 A/G: 50/112 = 0.446 A/A: 12/112 = 0.107 5.828 0 1 2 G and p-values are Williams’-corrected 66 39 7 6.122 4.684E-02 142 b. GSE16019: Chen M et al. (2009) Genotype frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 G/G: 34/70 = 0.486 A/G: 29/70 = 0.414 A/A: 7/70 = 0.100 HapMap Phase 1 (aggr.) G/G: 203/392 = 0.518 A/G: 163/392 = 0.416 A/A: 26/392 = 0.066 GSE16019 vs. HapMap aggr. 34 29 7 203 163 26 34 29 7 50 50 12 34 29 7 49 34 1 34 29 7 38 40 6 1 6.664 3.572E-02 0.767 6.815E-01 GSE16019 vs. HapMap YRI HapMap YRI G/G: 66/112 = 0.589 A/G: 39/112 = 0.348 A/A: 7/112 = 0.062 8.772E-01 GSE16019 vs. HapMap JPT HapMap JPT G/G: 38/84 = 0.452 A/G: 40/84 = 0.476 A/A: 6/84 = 0.071 0.262 GSE16019 vs. HapMap CHB HapMap CHB G/G: 49/84 = 0.583 A/G: 34/84 = 0.405 A/A: 1/84 = 0.012 6.191E-01 GSE16019 vs. HapMap CEU HapMap CEU G/G: 50/112 = 0.446 A/G: 50/112 = 0.446 A/A: 12/112 = 0.107 0.959 34 29 7 G and p-values are Williams’-corrected 66 39 7 2.068 3.556E-01 143 c. GSE13282: Gordan et al. (2008) Genotype frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 G/G: 7/15 = 0.467 A/G: 7/15 = 0.467 A/A: 1/15 = 0.067 HapMap Phase 1 (aggr.) G/G: 203/392 = 0.518 A/G: 163/392 = 0.416 A/A: 26/392 = 0.066 GSE13282 vs. HapMap aggr. 7 7 1 203 163 26 7 7 1 50 50 12 7 7 1 49 34 1 7 7 1 38 40 6 1 1.359 5.069E-01 0.011 9.945E-01 GSE13282 vs. HapMap YRI HapMap YRI G/G: 66/112 = 0.589 A/G: 39/112 = 0.348 A/A: 7/112 = 0.062 8.851E-01 GSE13282 vs. HapMap JPT HapMap JPT G/G: 38/84 = 0.452 A/G: 40/84 = 0.476 A/A: 6/84 = 0.071 0.244 GSE13282 vs. HapMap CHB HapMap CHB G/G: 49/84 = 0.583 A/G: 34/84 = 0.405 A/A: 1/84 = 0.012 9.291E-01 GSE13282 vs. HapMap CEU HapMap CEU G/G: 50/112 = 0.446 A/G: 50/112 = 0.446 A/A: 12/112 = 0.107 0.147 7 7 1 G and p-values are Williams’-corrected 66 39 7 0.763 6.828E-01 144 d. GSE19189: Letouze et al. (2010) Genotype frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 G/G: 7/20 = 0.350 A/G: 10/20 = 0.500 A/A: 3/20 = 0.150 HapMap Phase 1 (aggr.) G/G: 203/392 = 0.518 A/G: 163/392 = 0.416 A/A: 26/392 = 0.066 GSE19189 vs. HapMap aggr. 7 10 3 203 163 26 7 10 3 50 50 12 7 10 3 49 34 1 7 10 3 38 40 6 1 7.065 2.923E-02 1.342 5.112E-01 GSE19189 vs. HapMap YRI HapMap YRI G/G: 66/112 = 0.589 A/G: 39/112 = 0.348 A/A: 7/112 = 0.062 7.022E-01 GSE19189 vs. HapMap JPT HapMap JPT G/G: 38/84 = 0.452 A/G: 40/84 = 0.476 A/A: 6/84 = 0.071 0.707 GSE19189 vs. HapMap CHB HapMap CHB G/G: 49/84 = 0.583 A/G: 34/84 = 0.405 A/A: 1/84 = 0.012 2.555E-01 GSE19189 vs. HapMap CEU HapMap CEU G/G: 50/112 = 0.446 A/G: 50/112 = 0.446 A/A: 12/112 = 0.107 2.729 7 10 3 G and p-values are Williams’-corrected 66 39 7 4.061 1.313E-01 145 e. GSE10506: Nancarrow et al. (2008) Genotype frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 G/G: 14/22 = 0.636 A/G: 3/22 = 0.136 A/A: 5/22 = 0.227 HapMap Phase 1 (aggr.) G/G: 203/392 = 0.518 A/G: 163/392 = 0.416 A/A: 26/392 = 0.066 GSE10506 vs. HapMap aggr. 14 3 5 203 163 26 14 3 5 50 50 12 14 3 5 49 34 1 14 3 5 38 40 6 1 14.099 8.678E-04 10.23 6.006E-03 GSE10506 vs. HapMap YRI HapMap YRI G/G: 66/112 = 0.589 A/G: 39/112 = 0.348 A/A: 7/112 = 0.062 1.501E-02 GSE10506 vs. HapMap JPT HapMap JPT G/G: 38/84 = 0.452 A/G: 40/84 = 0.476 A/A: 6/84 = 0.071 8.398 GSE10506 vs. HapMap CHB HapMap CHB G/G: 49/84 = 0.583 A/G: 34/84 = 0.405 A/A: 1/84 = 0.012 6.529E-03 GSE10506 vs. HapMap CEU HapMap CEU G/G: 50/112 = 0.446 A/G: 50/112 = 0.446 A/A: 12/112 = 0.107 10.063 14 3 5 G and p-values are Williams’-corrected 66 39 7 7.14 2.816E-02 146 f. GSE18799: Popova et al. (2009) Genotype frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 G/G: 6/17 = 0.353 A/G: 8/17 = 0.471 A/A: 3/17 = 0.176 HapMap Phase 1 (aggr.) G/G: 203/392 = 0.518 A/G: 163/392 = 0.416 A/A: 26/392 = 0.066 GSE18799 vs. HapMap aggr. 6 8 3 203 163 26 6 8 3 50 50 12 6 8 3 49 34 1 6 8 3 38 40 6 1 7.202 2.730E-02 1.663 4.354E-01 GSE18799 vs. HapMap YRI HapMap YRI G/G: 66/112 = 0.589 A/G: 39/112 = 0.348 A/A: 7/112 = 0.062 6.627E-01 GSE18799 vs. HapMap JPT HapMap JPT G/G: 38/84 = 0.452 A/G: 40/84 = 0.476 A/A: 6/84 = 0.071 0.823 GSE18799 vs. HapMap CHB HapMap CHB G/G: 49/84 = 0.583 A/G: 34/84 = 0.405 A/A: 1/84 = 0.012 2.418E-01 GSE18799 vs. HapMap CEU HapMap CEU G/G: 50/112 = 0.446 A/G: 50/112 = 0.446 A/A: 12/112 = 0.107 2.839 6 8 3 G and p-values are Williams’-corrected 66 39 7 3.837 1.468E-01 147 g. GSE9003: Stark & Hayward (2007) Genotype frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 G/G: 38/56 = 0.679 A/G: 10/56 = 0.179 A/A: 8/56 = 0.143 HapMap Phase 1 (aggr.) G/G: 203/392 = 0.518 A/G: 163/392 = 0.416 A/A: 26/392 = 0.066 GSE9003 vs. HapMap aggr. 38 10 8 203 163 26 38 10 8 50 50 12 38 10 8 49 34 1 38 10 8 38 40 6 1 15.232 4.925E-04 13.574 1.128E-03 GSE9003 vs. HapMap YRI HapMap YRI G/G: 66/112 = 0.589 A/G: 39/112 = 0.348 A/A: 7/112 = 0.062 2.166E-03 GSE9003 vs. HapMap JPT HapMap JPT G/G: 38/84 = 0.452 A/G: 40/84 = 0.476 A/A: 6/84 = 0.071 12.27 GSE9003 vs. HapMap CHB HapMap CHB G/G: 49/84 = 0.583 A/G: 34/84 = 0.405 A/A: 1/84 = 0.012 1.079E-03 GSE9003 vs. HapMap CEU HapMap CEU G/G: 50/112 = 0.446 A/G: 50/112 = 0.446 A/A: 12/112 = 0.107 13.663 38 10 8 G and p-values are Williams’-corrected 66 39 7 6.829 3.289E-02 148 h. GSE19177: Waddell et al. (2010) Genotype frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 G/G: 13/27 = 0.481 A/G: 8/27 = 0.296 A/A: 6/27 = 0.222 HapMap Phase 1 (aggr.) G/G: 203/392 = 0.518 A/G: 163/392 = 0.416 A/A: 26/392 = 0.066 GSE19177 vs. HapMap aggr. 13 8 6 203 163 26 13 8 6 50 50 12 13 8 6 49 34 1 13 8 6 38 40 6 1 12.068 2.396E-03 5.154 7.600E-02 GSE19177 vs. HapMap YRI HapMap YRI G/G: 66/112 = 0.589 A/G: 39/112 = 0.348 A/A: 7/112 = 0.062 2.067E-01 GSE19177 vs. HapMap JPT HapMap JPT G/G: 38/84 = 0.452 A/G: 40/84 = 0.476 A/A: 6/84 = 0.071 3.153 GSE19177 vs. HapMap CHB HapMap CHB G/G: 49/84 = 0.583 A/G: 34/84 = 0.405 A/A: 1/84 = 0.012 4.496E-02 GSE19177 vs. HapMap CEU HapMap CEU G/G: 50/112 = 0.446 A/G: 50/112 = 0.446 A/A: 12/112 = 0.107 6.204 13 8 6 G and p-values are Williams’-corrected 66 39 7 5.145 7.634E-02 149 6. rs10403583 (LGALS4) a. GSE21168: Castillo et al. (2010) Genotype frequencies Castillo, GSE21168 GSE21168 HapMap G1 p-value1 A/A: 0/3 = 0.000 A/G: 1/3 = 0.333 G/G: 2/3 = 0.667 HapMap Phase 1 (aggr.) A/A: 12/396 = 0.030 A/G: 108/396 = 0.273 G/G: 276/396 = 0.697 GSE21168 vs. HapMap aggr. 0 1 2 12 108 276 0 1 2 1 35 77 0 1 2 1 16 67 0 1 2 0 13 73 1 0.108 9.474E-01 0.432 8.057E-01 GSE21168 vs. HapMap YRI HapMap YRI A/A: 10/113 = 0.088 A/G: 44/113 = 0.389 G/G: 59/113 = 0.522 9.935E-01 GSE21168 vs. HapMap JPT HapMap JPT A/A: 0/86 = 0.000 A/G: 13/86 = 0.151 G/G: 73/86 = 0.849 0.013 GSE21168 vs. HapMap CHB HapMap CHB A/A: 1/84 = 0.012 A/G: 16/84 = 0.190 G/G: 67/84 = 0.798 9.470E-01 GSE21168 vs. HapMap CEU HapMap CEU A/A: 1/113 = 0.009 A/G: 35/113 = 0.031 G/G: 77/113 = 0.681 0.109 0 1 2 G and p-values are Williams’-corrected 10 44 59 0.462 7.937E-01 150 b. GSE16019: Chen M et al. (2009) Genotype frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 A/A: 3/79 = 0.038 A/G: 25/79 = 0.316 G/G: 51/79 = 0.646 HapMap Phase 1 (aggr.) A/A: 12/396 = 0.030 A/G: 108/396 = 0.273 G/G: 276/396 = 0.697 GSE16019 vs. HapMap aggr. 3 25 51 12 108 276 3 25 51 1 35 77 3 25 51 1 16 67 3 25 51 0 13 73 1 4.733 9.381E-02 10.673 4.813E-03 GSE16019 vs. HapMap YRI HapMap YRI A/A: 10/113 = 0.088 A/G: 44/113 = 0.389 G/G: 59/113 = 0.522 3.953E-01 GSE16019 vs. HapMap JPT HapMap JPT A/A: 0/86 = 0.000 A/G: 13/86 = 0.151 G/G: 73/86 = 0.849 1.856 GSE16019 vs. HapMap CHB HapMap CHB A/A: 1/84 = 0.012 A/G: 16/84 = 0.190 G/G: 67/84 = 0.798 6.774E-01 GSE16019 vs. HapMap CEU HapMap CEU A/A: 1/113 = 0.009 A/G: 35/113 = 0.031 G/G: 77/113 = 0.681 0.779 3 25 51 G and p-values are Williams’-corrected 10 44 59 3.714 1.561E-01 151 c. GSE13282: Gordan et al. (2008) Genotype frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 A/A: 1/21 = 0.048 A/G: 6/21 = 0.286 G/G: 14/21 = 0.667 HapMap Phase 1 (aggr.) A/A: 12/396 = 0.030 A/G: 108/396 = 0.273 G/G: 276/396 = 0.697 GSE13282 vs. HapMap aggr. 1 6 14 12 108 276 1 6 14 1 35 77 1 6 14 1 16 67 1 6 14 0 13 73 1 1.574 4.552E-01 3.744 1.538E-01 GSE13282 vs. HapMap YRI HapMap YRI A/A: 10/113 = 0.088 A/G: 44/113 = 0.389 G/G: 59/113 = 0.522 6.005E-01 GSE13282 vs. HapMap JPT HapMap JPT A/A: 0/86 = 0.000 A/G: 13/86 = 0.151 G/G: 73/86 = 0.849 1.02 GSE13282 vs. HapMap CHB HapMap CHB A/A: 1/84 = 0.012 A/G: 16/84 = 0.190 G/G: 67/84 = 0.798 9.144E-01 GSE13282 vs. HapMap CEU HapMap CEU A/A: 1/113 = 0.009 A/G: 35/113 = 0.031 G/G: 77/113 = 0.681 0.179 1 6 14 G and p-values are Williams’-corrected 10 44 59 1.506 4.710E-01 152 d. GSE19189: Letouze et al. (2010) Genotype frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 A/A: 0/18 = 0.000 A/G: 3/18 = 0.167 G/G: 15/18 = 0.833 HapMap Phase 1 (aggr.) A/A: 12/396 = 0.030 A/G: 108/396 = 0.273 G/G: 276/396 = 0.697 GSE19189 vs. HapMap aggr. 0 3 15 12 108 276 0 3 15 1 35 77 0 3 15 1 16 67 0 3 15 0 13 73 1 0.302 8.598E-01 0.003 9.988E-01 GSE19189 vs. HapMap YRI HapMap YRI A/A: 10/113 = 0.088 A/G: 44/113 = 0.389 G/G: 59/113 = 0.522 5.353E-01 GSE19189 vs. HapMap JPT HapMap JPT A/A: 0/86 = 0.000 A/G: 13/86 = 0.151 G/G: 73/86 = 0.849 1.25 GSE19189 vs. HapMap CHB HapMap CHB A/A: 1/84 = 0.012 A/G: 16/84 = 0.190 G/G: 67/84 = 0.798 3.725E-01 GSE19189 vs. HapMap CEU HapMap CEU A/A: 1/113 = 0.009 A/G: 35/113 = 0.031 G/G: 77/113 = 0.681 1.975 0 3 15 G and p-values are Williams’-corrected 10 44 59 7.354 2.530E-02 153 e. GSE10506: Nancarrow et al. (2008) Genotype frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 A/A: 0/21 = 0 A/G: 6/21 = 0.286 G/G: 15/21 = 0.714 HapMap Phase 1 (aggr.) A/A: 12/396 = 0.030 A/G: 108/396 = 0.273 G/G: 276/396 = 0.697 GSE10506 vs. HapMap aggr. 0 6 15 12 108 276 0 6 15 1 35 77 0 6 15 1 16 67 0 6 15 0 13 73 1 0.871 6.469E-01 1.818 4.029E-01 GSE10506 vs. HapMap YRI HapMap YRI A/A: 10/113 = 0.088 A/G: 44/113 = 0.389 G/G: 59/113 = 0.522 8.794E-01 GSE10506 vs. HapMap JPT HapMap JPT A/A: 0/86 = 0.000 A/G: 13/86 = 0.151 G/G: 73/86 = 0.849 0.257 GSE10506 vs. HapMap CHB HapMap CHB A/A: 1/84 = 0.012 A/G: 16/84 = 0.190 G/G: 67/84 = 0.798 5.793E-01 GSE10506 vs. HapMap CEU HapMap CEU A/A: 1/113 = 0.009 A/G: 35/113 = 0.031 G/G: 77/113 = 0.681 1.092 0 6 15 G and p-values are Williams’-corrected 10 44 59 4.704 9.518E-02 154 f. GSE18799: Popova et al. (2009) Genotype frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 A/A: 2/21 = 0.095 A/G: 8/21 = 0.381 G/G: 11/21 = 0.524 HapMap Phase 1 (aggr.) A/A: 12/396 = 0.030 A/G: 108/396 = 0.273 G/G: 276/396 = 0.697 GSE18799 vs. HapMap aggr. 2 8 11 12 108 276 2 8 11 1 35 77 2 8 11 1 16 67 2 8 11 0 13 73 1 6.22 4.460E-02 10.316 5.753E-03 GSE18799 vs. HapMap YRI HapMap YRI A/A: 10/113 = 0.088 A/G: 44/113 = 0.389 G/G: 59/113 = 0.522 1.285E-01 GSE18799 vs. HapMap JPT HapMap JPT A/A: 0/86 = 0.000 A/G: 13/86 = 0.151 G/G: 73/86 = 0.849 4.103 GSE18799 vs. HapMap CHB HapMap CHB A/A: 1/84 = 0.012 A/G: 16/84 = 0.190 G/G: 67/84 = 0.798 2.225E-01 GSE18799 vs. HapMap CEU HapMap CEU A/A: 1/113 = 0.009 A/G: 35/113 = 0.031 G/G: 77/113 = 0.681 3.006 2 8 11 G and p-values are Williams’-corrected 10 44 59 0.011 9.945E-01 155 g. GSE9003: Stark & Hayward (2007) Genotype frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 A/A: 5/72 = 0.069 A/G: 11/72 = 0.153 G/G: 56/72 = 0.778 HapMap Phase 1 (aggr.) A/A: 12/396 = 0.030 A/G: 108/396 = 0.273 G/G: 276/396 = 0.697 GSE9003 vs. HapMap aggr. 5 11 56 12 108 276 5 11 56 1 35 77 5 11 56 1 16 67 5 11 56 0 13 73 1 3.717 1.559E-01 7.634 2.199E-02 GSE9003 vs. HapMap YRI HapMap YRI A/A: 10/113 = 0.088 A/G: 44/113 = 0.389 G/G: 59/113 = 0.522 7.654E-03 GSE9003 vs. HapMap JPT HapMap JPT A/A: 0/86 = 0.000 A/G: 13/86 = 0.151 G/G: 73/86 = 0.849 9.745 GSE9003 vs. HapMap CHB HapMap CHB A/A: 1/84 = 0.012 A/G: 16/84 = 0.190 G/G: 67/84 = 0.798 4.154E-02 GSE9003 vs. HapMap CEU HapMap CEU A/A: 1/113 = 0.009 A/G: 35/113 = 0.031 G/G: 77/113 = 0.681 6.362 5 11 56 G and p-values are Williams’-corrected 10 44 59 13.499 1.171E-03 156 h. GSE19177: Waddell et al. (2010) Genotype frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 A/A: 0/32 = 0.000 A/G: 6/32 = 0.188 G/G: 26/32 = 0.813 HapMap Phase 1 (aggr.) A/A: 12/396 = 0.030 A/G: 108/396 = 0.273 G/G: 276/396 = 0.697 GSE19177 vs. HapMap aggr. 0 6 26 12 108 276 0 6 26 1 35 77 0 6 26 1 16 67 0 6 26 0 13 73 1 0.484 7.851E-01 0.21 9.003E-01 GSE19177 vs. HapMap YRI HapMap YRI A/A: 10/113 = 0.088 A/G: 44/113 = 0.389 G/G: 59/113 = 0.522 4.082E-01 GSE19177 vs. HapMap JPT HapMap JPT A/A: 0/86 = 0.000 A/G: 13/86 = 0.151 G/G: 73/86 = 0.849 1.792 GSE19177 vs. HapMap CHB HapMap CHB A/A: 1/84 = 0.012 A/G: 16/84 = 0.190 G/G: 67/84 = 0.798 2.249E-01 GSE19177 vs. HapMap CEU HapMap CEU A/A: 1/113 = 0.009 A/G: 35/113 = 0.031 G/G: 77/113 = 0.681 2.984 0 6 26 G and p-values are Williams’-corrected 10 44 59 11.126 3.837E-03 157 7. rs10489789 (LGALS8) a. GSE16019: Chen M et al. (2009) Genotype frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 G/G: 63/75 = 0.840 A/G: 10/75 = 0.133 A/A: 2/75 = 0.027 HapMap Phase 1 (aggr.) G/G: 298/384 = 0.776 G/A: 74/384 = 0.193 A/A: 12/384 = 0.031 GSE16019 vs. HapMap aggr. 63 10 2 298 74 12 63 10 2 83 27 1 63 10 2 78 2 0 63 10 2 81 2 0 1 8.752 1.258E-02 9.11 1.051E-02 GSE16019 vs. HapMap YRI HapMap YRI G/G: 56/110 = 0.509 G/A: 43/110 = 0.391 A/A: 11/110 = 0.100 1.478E-01 GSE16019 vs. HapMap JPT HapMap JPT G/G: 81/83 = 0.976 G/A: 2/83 = 0.024 A/A: 0/83 = 0 3.824 GSE16019 vs. HapMap CHB HapMap CHB G/G: 78/80 = 0.975 G/A: 2/80 = 0.025 A/A: 0/80 = 0 4.480E-01 GSE16019 vs. HapMap CEU HapMap CEU G/G: 83/111 = 0.748 G/A: 27/111 = 0.243 A/A: 1/111 = 0.009 1.606 63 10 2 G and p-values are Williams’-corrected 56 43 11 22.172 1.533E-05 158 b. GSE13282: Gordan et al. (2008) Genotype frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 G/G: 16/21 = 0.762 G/A: 4/21 = 0.19 A/A: 1/21 = 0.048 HapMap Phase 1 (aggr.) G/G: 298/384 = 0.776 G/A: 74/384 = 0.193 A/A: 12/384 = 0.031 GSE13282 vs. HapMap aggr. 16 4 1 298 74 12 16 4 1 83 27 1 16 4 1 78 2 0 16 4 1 81 2 0 1 6.597 3.694E-02 6.723 3.468E-02 GSE13282 vs. HapMap YRI HapMap YRI G/G: 56/110 = 0.509 G/A: 43/110 = 0.391 A/A: 11/110 = 0.100 5.641E-01 GSE13282 vs. HapMap JPT HapMap JPT G/G: 81/83 = 0.976 G/A: 2/83 = 0.024 A/A: 0/83 = 0 1.145 GSE13282 vs. HapMap CHB HapMap CHB G/G: 78/80 = 0.975 G/A: 2/80 = 0.025 A/A: 0/80 = 0 9.361E-01 GSE13282 vs. HapMap CEU HapMap CEU G/G: 83/111 = 0.748 G/A: 27/111 = 0.243 A/A: 1/111 = 0.009 0.132 16 4 1 G and p-values are Williams’-corrected 56 43 11 4.535 1.036E-01 159 c. GSE19189: Letouze et al. (2010) Genotype frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 G/G: 16/21 = 0.762 G/A: 4/21 = 0.19 A/A: 1/21 = 0.048 HapMap Phase 1 (aggr.) G/G: 298/384 = 0.776 G/A: 74/384 = 0.193 A/A: 12/384 = 0.031 GSE19189 vs. HapMap aggr. 16 4 1 298 74 12 16 4 1 83 27 1 16 4 1 78 2 0 16 4 1 81 2 0 1 6.597 3.694E-02 6.723 3.468E-02 GSE19189 vs. HapMap YRI HapMap YRI G/G: 56/110 = 0.509 G/A: 43/110 = 0.391 A/A: 11/110 = 0.100 5.641E-01 GSE19189 vs. HapMap JPT HapMap JPT G/G: 81/83 = 0.976 G/A: 2/83 = 0.024 A/A: 0/83 = 0 1.145 GSE19189 vs. HapMap CHB HapMap CHB G/G: 78/80 = 0.975 G/A: 2/80 = 0.025 A/A: 0/80 = 0 9.361E-01 GSE19189 vs. HapMap CEU HapMap CEU G/G: 83/111 = 0.748 G/A: 27/111 = 0.243 A/A: 1/111 = 0.009 0.132 16 4 1 G and p-values are Williams’-corrected 56 43 11 4.535 1.036E-01 160 d. GSE10506: Nancarrow et al. (2008) Genotype frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 G/G: 18/21 = 0.857 G/A: 2/21 = 0.095 A/A: 1/21 = 0.048 HapMap Phase 1 (aggr.) G/G: 298/384 = 0.776 G/A: 74/384 = 0.193 A/A: 12/384 = 0.031 GSE10506 vs. HapMap aggr. 18 2 1 298 74 12 18 2 1 83 27 1 18 2 1 78 2 0 18 2 1 81 2 0 1 3.31 1.911E-01 3.378 1.847E-01 GSE10506 vs. HapMap YRI HapMap YRI G/G: 56/110 = 0.509 G/A: 43/110 = 0.391 A/A: 11/110 = 0.100 2.407E-01 GSE10506 vs. HapMap JPT HapMap JPT G/G: 81/83 = 0.976 G/A: 2/83 = 0.024 A/A: 0/83 = 0 2.848 GSE10506 vs. HapMap CHB HapMap CHB G/G: 78/80 = 0.975 G/A: 2/80 = 0.025 A/A: 0/80 = 0 5.132E-01 GSE10506 vs. HapMap CEU HapMap CEU G/G: 83/111 = 0.748 G/A: 27/111 = 0.243 A/A: 1/111 = 0.009 1.334 18 2 1 G and p-values are Williams’-corrected 56 43 11 9.407 9.063E-03 161 e. GSE18799: Popova et al. (2009) Genotype frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 G/G: 14/21 = 0.667 G/A: 6/21 = 0.286 A/A: 1/21 = 0.048 HapMap Phase 1 (aggr.) G/G: 298/384 = 0.776 G/A: 74/384 = 0.193 A/A: 12/384 = 0.031 GSE18799 vs. HapMap aggr. 14 6 1 298 74 12 14 6 1 83 27 1 14 6 1 78 2 0 14 6 1 81 2 0 1 10.701 4.746E-03 10.881 4.337E-03 GSE18799 vs. HapMap YRI HapMap YRI G/G: 56/110 = 0.509 G/A: 43/110 = 0.391 A/A: 11/110 = 0.100 5.510E-01 GSE18799 vs. HapMap JPT HapMap JPT G/G: 81/83 = 0.976 G/A: 2/83 = 0.024 A/A: 0/83 = 0 1.192 GSE18799 vs. HapMap CHB HapMap CHB G/G: 78/80 = 0.975 G/A: 2/80 = 0.025 A/A: 0/80 = 0 5.819E-01 GSE18799 vs. HapMap CEU HapMap CEU G/G: 83/111 = 0.748 G/A: 27/111 = 0.243 A/A: 1/111 = 0.009 1.083 14 6 1 G and p-values are Williams’-corrected 56 43 11 1.843 3.979E-01 162 f. GSE9003: Stark & Hayward (2007) Genotype frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 G/G: 59/73 = 0.808 G/A: 12/73 = 0.164 A/A: 2/73 = 0.027 HapMap Phase 1 (aggr.) G/G: 298/384 = 0.776 G/A: 74/384 = 0.193 A/A: 12/384 = 0.031 GSE9003 vs. HapMap aggr. 59 12 2 298 74 12 59 12 2 83 27 1 59 12 2 78 2 0 59 12 2 81 2 0 1 11.387 3.368E-03 11.827 2.703E-03 GSE9003 vs. HapMap YRI HapMap YRI G/G: 56/110 = 0.509 G/A: 43/110 = 0.391 A/A: 11/110 = 0.100 3.296E-01 GSE9003 vs. HapMap JPT HapMap JPT G/G: 81/83 = 0.976 G/A: 2/83 = 0.024 A/A: 0/83 = 0 2.22 GSE9003 vs. HapMap CHB HapMap CHB G/G: 78/80 = 0.975 G/A: 2/80 = 0.025 A/A: 0/80 = 0 8.328E-01 GSE9003 vs. HapMap CEU HapMap CEU G/G: 83/111 = 0.748 G/A: 27/111 = 0.243 A/A: 1/111 = 0.009 0.366 59 12 2 G and p-values are Williams’-corrected 56 43 11 17.492 1.591E-04 163 g. GSE19177: Waddell et al. (2010) Genotype frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 G/G: 21/29 = 0.724 G/A: 6/29 = 0.207 A/A: 2/29 = 0.069 HapMap Phase 1 (aggr.) G/G: 298/384 = 0.776 G/A: 74/384 = 0.193 A/A: 12/384 = 0.031 HapMap CEU G/G: 83/111 = 0.748 G/A: 27/111 = 0.243 A/A: 1/111 = 0.009 HapMap CHB G/G: 78/80 = 0.975 G/A: 2/80 = 0.025 A/A: 0/80 = 0 GSE19177 vs. HapMap aggr. 21 6 2 21 6 2 21 6 2 298 74 12 83 27 1 78 2 0 21 6 2 81 2 0 1 GSE19177 vs. HapMap CEU 2.677 2.622E-01 GSE19177 vs. HapMap CHB 12.323 2.109E-03 12.62 1.818E-03 GSE19177 vs. HapMap YRI HapMap YRI G/G: 56/110 = 0.509 G/A: 43/110 = 0.391 A/A: 11/110 = 0.100 6.316E-01 GSE19177 vs. HapMap JPT HapMap JPT G/G: 81/83 = 0.976 G/A: 2/83 = 0.024 A/A: 0/83 = 0 0.919 21 6 2 G and p-values are Williams’-corrected 56 43 11 4.353 1.134E-01 164 APPENDIX C Association analysis, allele frequencies For each SNP, allele frequencies were compared between datasets archived at the Gene Expression Omnibus (noted here by GEO accession number and associated research article, eg. GSE16019: Chen M et al. (2009)) and both individual and aggregated HapMap Phase 1 populations as controls, with Williams’ corrected G-test for independence used to determine association. Statistical relevance of individual association analyses is shown in each table below. 165 1. rs428007 (CLC/LGALS10) a. GSE16019: Chen M et al. (2009) Allele frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 T: 66/154 = 0.429 C: 88/154 = 0.571 HapMap Phase 1 (aggr.) T: 324/790 = 0.410 C: 466/790 = 0.590 GSE16019 vs. HapMap aggr. 66 88 324 466 66 88 68 156 66 88 106 62 66 88 110 62 1 13.247 2.730E-04 14.585 1.340E-04 GSE16019 vs. HapMap YRI HapMap YRI T: 40/226 = 0.177 C: 186/226 = 0.823 1.302E-02 GSE16019 vs. HapMap JPT HapMap JPT T: 110/172 = 0.640 C: 62/172 = 0.360 6.167 GSE16019 vs. HapMap CHB HapMap CHB T: 106/168 = 0.631 C: 62/168 = 0.369 6.722E-01 GSE16019 vs. HapMap CEU HapMap CEU T: 68/224 = 0.304 C: 156/224 = 0.696 0.179 66 88 G and p-values are Williams’-corrected 40 186 28.401 9.861E-08 166 b. GSE13282: Gordan et al. (2008) Allele frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 T: 14/42 = 0.333 C: 28/42 = 0.667 HapMap Phase 1 (aggr.) T: 324/790 = 0.410 C: 466/790 = 0.590 GSE13282 vs. HapMap aggr. 14 28 324 466 14 28 68 156 14 28 106 62 14 28 110 62 1 11.962 5.430E-04 12.743 3.573E-04 GSE13282 vs. HapMap YRI HapMap YRI T: 40/226 = 0.177 C: 186/226 = 0.823 7.053E-01 GSE13282 vs. HapMap JPT HapMap JPT T: 110/172 = 0.640 C: 62/172 = 0.360 0.143 GSE13282 vs. HapMap CHB HapMap CHB T: 106/168 = 0.631 C: 62/168 = 0.369 3.210E-01 GSE13282 vs. HapMap CEU HapMap CEU T: 68/224 = 0.304 C: 156/224 = 0.696 0.985 14 28 G and p-values are Williams’-corrected 40 186 4.759 2.915E-02 167 c. GSE19189: Letouze et al. (2010) Allele frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 T: 13/40 = 0.325 C: 27/40 = 0.675 HapMap Phase 1 (aggr.) T: 324/790 = 0.410 C: 466/790 = 0.590 GSE19189 vs. HapMap aggr. 13 27 324 466 13 27 68 156 13 27 106 62 13 27 110 62 1 12.16 7.884E-01 12.929 3.235E-04 GSE19189 vs. HapMap YRI HapMap YRI T: 40/226 = 0.177 C: 186/226 = 0.823 7.884E-01 GSE19189 vs. HapMap JPT HapMap JPT T: 110/172 = 0.640 C: 62/172 = 0.360 0.072 GSE19189 vs. HapMap CHB HapMap CHB T: 106/168 = 0.631 C: 62/168 = 0.369 2.819E-01 GSE19189 vs. HapMap CEU HapMap CEU T: 68/224 = 0.304 C: 156/224 = 0.696 1.158 13 27 G and p-values are Williams’-corrected 40 186 4.125 4.225E-02 168 d. GSE10506: Nancarrow et al. (2008) Allele frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 T: 15/40 = 0.375 C: 25/40 = 0.625 HapMap Phase 1 (aggr.) T: 324/790 = 0.410 C: 466/790 = 0.590 GSE10506 vs. HapMap aggr. 15 25 324 466 15 25 68 156 15 25 106 62 15 25 110 62 1 8.488 3.575E-03 9.129 2.516E-03 GSE10506 vs. HapMap YRI HapMap YRI T: 40/226 = 0.177 C: 186/226 = 0.823 3.799E-01 GSE10506 vs. HapMap JPT HapMap JPT T: 110/172 = 0.640 C: 62/172 = 0.360 0.771 GSE10506 vs. HapMap CHB HapMap CHB T: 106/168 = 0.631 C: 62/168 = 0.369 6.596E-01 GSE10506 vs. HapMap CEU HapMap CEU T: 68/224 = 0.304 C: 156/224 = 0.696 0.194 15 25 G and p-values are Williams’-corrected 40 186 7.056 7.900E-03 169 e. GSE18799: Popova et al. (2009) Allele frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 T: 15/42 = 0.357 C: 27/42 = 0.643 HapMap Phase 1 (aggr.) T: 324/790 = 0.410 C: 466/790 = 0.590 GSE18799 vs. HapMap aggr. 15 27 324 466 15 27 68 156 15 27 106 62 15 27 110 62 1 10.111 3.147E-01 10.827 1.000E-03 GSE18799 vs. HapMap YRI HapMap YRI T: 40/226 = 0.177 C: 186/226 = 0.823 4.990E-01 GSE18799 vs. HapMap JPT HapMap JPT T: 110/172 = 0.640 C: 62/172 = 0.360 0.457 GSE18799 vs. HapMap CHB HapMap CHB T: 106/168 = 0.631 C: 62/168 = 0.369 4.958E-01 GSE18799 vs. HapMap CEU HapMap CEU T: 68/224 = 0.304 C: 156/224 = 0.696 0.464 15 27 G and p-values are Williams’-corrected 40 186 6.181 1.291E-02 170 f. GSE9003: Stark & Hayward (2007) Allele frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 T: 39/146 = 0.267 C: 107/146 = 0.733 HapMap Phase 1 (aggr.) T: 324/790 = 0.410 C: 466/790 = 0.590 GSE9003 vs. HapMap aggr. 39 107 324 466 39 107 68 156 39 107 106 62 39 107 110 62 1 42.546 6.904E-11 45.03 1.940E-11 GSE9003 vs. HapMap YRI HapMap YRI T: 40/226 = 0.177 C: 186/226 = 0.823 4.495E-01 GSE9003 vs. HapMap JPT HapMap JPT T: 110/172 = 0.640 C: 62/172 = 0.360 0.572 GSE9003 vs. HapMap CHB HapMap CHB T: 106/168 = 0.631 C: 62/168 = 0.369 8.950E-04 GSE9003 vs. HapMap CEU HapMap CEU T: 68/224 = 0.304 C: 156/224 = 0.696 11.033 39 107 G and p-values are Williams’-corrected 40 186 4.209 4.021E-02 171 g. GSE19177: Waddell et al. (2010) Allele frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 T: 23/62 = 0.718 C: 39/62 = 0.629 HapMap Phase 1 (aggr.) T: 324/790 = 0.410 C: 466/790 = 0.590 GSE19177 vs. HapMap aggr. 23 39 324 466 23 39 68 156 23 39 106 62 23 39 110 62 1 12.307 4.513E-04 13.241 2.739E-04 GSE19177 vs. HapMap YRI HapMap YRI T: 40/226 = 0.177 C: 186/226 = 0.823 3.202E-01 GSE19177 vs. HapMap JPT HapMap JPT T: 110/172 = 0.640 C: 62/172 = 0.360 0.988 GSE19177 vs. HapMap CHB HapMap CHB T: 106/168 = 0.631 C: 62/168 = 0.369 5.452E-01 GSE19177 vs. HapMap CEU HapMap CEU T: 68/224 = 0.304 C: 156/224 = 0.696 0.366 23 39 G and p-values are Williams’-corrected 40 186 9.683 1.860E-03 172 2. rs929039 (LGALS1) a. GSE16019: Chen M et al. (2009) Allele frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 T: 102/150 = 0.680 C: 48/150 = 0.320 HapMap Phase 1 (aggr.) T: 575/792 = 0.726 C: 217/792 = 0.274 GSE16019 vs. HapMap aggr. 102 48 575 217 102 48 152 74 102 48 132 36 102 48 119 53 1 4.528 3.334E-02 0.052 8.196E-01 GSE16019 vs. HapMap YRI HapMap YRI T: 172/226 = 0.761 C: 54/226 = 0.239 8.795E-01 GSE16019 vs. HapMap JPT HapMap JPT T: 119/172 = 0.692 C: 53/172 = 0.308 0.023 GSE16019 vs. HapMap CHB HapMap CHB T: 132/168 = 0.786 C: 36/168 = 0.214 2.566E-01 GSE16019 vs. HapMap CEU HapMap CEU T: 152/226 = 0.673 C: 74/226 = 0.327 1.287 102 48 G and p-values are Williams’-corrected 172 54 2.951 8.582E-02 173 b. GSE13282: Gordan et al. (2008) Allele frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 T: 22/36 = 0.611 C: 14/36 = 0.389 HapMap Phase 1 (aggr.) T: 575/792 = 0.726 C: 217/792 = 0.274 GSE13282 vs. HapMap aggr. 22 14 575 217 22 14 152 74 22 14 132 36 22 14 119 53 1 4.423 3.546E-02 0.852 3.560E-01 GSE13282 vs. HapMap YRI HapMap YRI T: 172/226 = 0.761 C: 54/226 = 0.239 4.760E-01 GSE13282 vs. HapMap JPT HapMap JPT T: 119/172 = 0.692 C: 53/172 = 0.308 0.508 GSE13282 vs. HapMap CHB HapMap CHB T: 132/168 = 0.786 C: 36/168 = 0.214 1.488E-01 GSE13282 vs. HapMap CEU HapMap CEU T: 152/226 = 0.673 C: 74/226 = 0.327 2.084 22 14 G and p-values are Williams’-corrected 172 54 3.317 6.857E-02 174 c. GSE19189: Letouze et al. (2010) Allele frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 T: 24/38 = 0.632 C: 14/38 = 368 HapMap Phase 1 (aggr.) T: 575/792 = 0.726 C: 217/792 = 0.274 GSE19189 vs. HapMap aggr. 24 14 575 217 24 14 152 74 24 14 132 36 24 14 119 53 1 3.658 5.580E-02 0.503 4.782E-01 GSE19189 vs. HapMap YRI HapMap YRI T: 172/226 = 0.761 C: 54/226 = 0.239 6.249E-01 GSE19189 vs. HapMap JPT HapMap JPT T: 119/172 = 0.692 C: 53/172 = 0.308 0.239 GSE19189 vs. HapMap CHB HapMap CHB T: 132/168 = 0.786 C: 36/168 = 0.214 2.207E-01 GSE19189 vs. HapMap CEU HapMap CEU T: 152/226 = 0.673 C: 74/226 = 0.327 1.5 24 14 G and p-values are Williams’-corrected 172 54 2.627 1.051E-01 175 d. GSE10506: Nancarrow et al. (2008) Allele frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 T: 30/40 = 0.75 C: 10/40 = 0.25 HapMap Phase 1 (aggr.) T: 575/792 = 0.726 C: 217/792 = 0.274 GSE10506 vs. HapMap aggr. 30 10 575 217 30 10 152 74 30 10 132 36 30 10 119 53 1 0.229 6.323E-01 0.53 4.666E-01 GSE10506 vs. HapMap YRI HapMap YRI T: 172/226 = 0.761 C: 54/226 = 0.239 3.267E-01 GSE10506 vs. HapMap JPT HapMap JPT T: 119/172 = 0.692 C: 53/172 = 0.308 0.962 GSE10506 vs. HapMap CHB HapMap CHB T: 132/168 = 0.786 C: 36/168 = 0.214 7.401E-01 GSE10506 vs. HapMap CEU HapMap CEU T: 152/226 = 0.673 C: 74/226 = 0.327 0.11 30 10 G and p-values are Williams’-corrected 172 54 0.023 8.795E-01 176 e. GSE18799: Popova et al. (2009) Allele frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 C: 16/38 = 0.421 T: 22/38 = 0.579 HapMap Phase 1 (aggr.) T: 575/792 = 0.726 C: 217/792 = 0.274 GSE18799 vs. HapMap aggr. 22 16 575 217 22 16 152 74 22 16 132 36 22 16 119 53 1 6.343 1.178E-02 1.717 1.901E-01 GSE18799 vs. HapMap YRI HapMap YRI T: 172/226 = 0.761 C: 54/226 = 0.239 2.700E-01 GSE18799 vs. HapMap JPT HapMap JPT T: 119/172 = 0.692 C: 53/172 = 0.308 1.217 GSE18799 vs. HapMap CHB HapMap CHB T: 132/168 = 0.786 C: 36/168 = 0.214 5.991E-02 GSE18799 vs. HapMap CEU HapMap CEU T: 152/226 = 0.673 C: 74/226 = 0.327 3.54 22 16 G and p-values are Williams’-corrected 172 54 5.029 2.493E-02 177 f. GSE9003: Stark & Hayward (2007) Allele frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 T: 88/128 = 0.688 C: 40/128 = 0.313 HapMap Phase 1 (aggr.) T: 575/792 = 0.726 C: 217/792 = 0.274 GSE9003 vs. HapMap aggr. 88 40 575 217 88 40 152 74 88 40 132 36 88 40 119 53 1 3.621 5.705E-02 0.006 9.383E-01 GSE9003 vs. HapMap YRI HapMap YRI T: 172/226 = 0.761 C: 54/226 = 0.239 7.733E-01 GSE9003 vs. HapMap JPT HapMap JPT T: 119/172 = 0.692 C: 53/172 = 0.308 0.083 GSE9003 vs. HapMap CHB HapMap CHB T: 132/168 = 0.786 C: 36/168 = 0.214 3.735E-01 GSE9003 vs. HapMap CEU HapMap CEU T: 152/226 = 0.673 C: 74/226 = 0.327 0.792 88 40 G and p-values are Williams’-corrected 172 54 2.223 1.360E-01 178 g. GSE19177: Waddell et al. (2010) Allele frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 T: 37/64 = 0.578 C: 27/64 = 0.422 HapMap Phase 1 (aggr.) T: 575/792 = 0.726 C: 217/792 = 0.274 GSE19177 vs. HapMap aggr. 37 27 575 217 37 27 152 74 37 27 132 36 37 27 119 53 1 9.502 2.052E-03 2.614 1.059E-01 GSE19177 vs. HapMap YRI HapMap YRI T: 172/226 = 0.761 C: 54/226 = 0.239 1.676E-01 GSE19177 vs. HapMap JPT HapMap JPT T: 119/172 = 0.692 C: 53/172 = 0.308 1.904 GSE19177 vs. HapMap CHB HapMap CHB T: 132/168 = 0.786 C: 36/168 = 0.214 1.540E-02 GSE19177 vs. HapMap CEU HapMap CEU T: 152/226 = 0.673 C: 74/226 = 0.327 5.87 37 27 G and p-values are Williams’-corrected 172 54 7.762 5.336E-03 179 3. rs2235338 (LGALS2) a. GSE21168: Castillo et al. (2010) Allele frequencies Castillo, GSE21168 GSE21168 HapMap G1 p-value1 G: 5/8 = 0.625 A: 3/8 = 0.375 HapMap Phase 1 (aggr.) G: 440/792 = 0.556 A: 352/792 = 0.445 GSE21168 vs. HapMap aggr. 5 3 440 352 5 3 84 142 5 3 118 50 5 3 85 87 1 0.194 6.596E-01 0.498 4.804E-01 GSE21168 vs. HapMap YRI HapMap YRI G: 153/226 = 0.677 A: 73/226 = 0.323 1.684E-01 GSE21168 vs. HapMap JPT HapMap JPT G: 85/172 = 0.494 A: 87/172 = 0.506 1.897 GSE21168 vs. HapMap CHB HapMap CHB G: 118/168 = 0.702 A: 50/168 = 0.298 7.005E-01 GSE21168 vs. HapMap CEU HapMap CEU G: 84/226 = 0.372 A: 142/226 = 0.628 0.148 5 3 G and p-values are Williams’-corrected 153 73 0.087 7.680E-01 180 b. GSE16019: Chen M et al. (2009) Allele frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 G: 66/152 = 0.434 A: 86/152 = 0.566 HapMap Phase 1 (aggr.) G: 440/792 = 0.556 A: 352/792 = 0.445 GSE16019 vs. HapMap aggr. 66 86 440 352 66 86 84 142 66 86 118 50 66 86 85 87 1 23.626 1.170E-06 1.163 2.809E-01 GSE16019 vs. HapMap YRI HapMap YRI G: 153/226 = 0.677 A: 73/226 = 0.323 2.247E-01 GSE16019 vs. HapMap JPT HapMap JPT G: 85/172 = 0.494 A: 87/172 = 0.506 1.474 GSE16019 vs. HapMap CHB HapMap CHB G: 118/168 = 0.702 A: 50/168 = 0.298 6.139E-03 GSE16019 vs. HapMap CEU HapMap CEU G: 84/226 = 0.372 A: 142/226 = 0.628 7.509 66 86 G and p-values are Williams’-corrected 153 73 21.922 2.840E-06 181 c. GSE13282: Gordan et al. (2008) Allele frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 G: 17/40 = 0.425 A: 23/40 = 0.575 HapMap Phase 1 (aggr.) G: 440/792 = 0.556 A: 352/792 = 0.445 GSE13282 vs. HapMap aggr. 17 23 440 352 17 23 84 142 17 23 118 50 17 23 85 87 1 10.316 1.319E-03 0.617 4.322E-01 GSE13282 vs. HapMap YRI HapMap YRI G: 153/226 = 0.677 A: 73/226 = 0.323 5.271E-01 GSE13282 vs. HapMap JPT HapMap JPT G: 85/172 = 0.494 A: 87/172 = 0.506 0.4 GSE13282 vs. HapMap CHB HapMap CHB G: 118/168 = 0.702 A: 50/168 = 0.298 1.086E-01 GSE13282 vs. HapMap CEU HapMap CEU G: 84/226 = 0.372 A: 142/226 = 0.628 2.574 17 23 G and p-values are Williams’-corrected 153 73 8.859 2.916E-03 182 d. GSE19189: Letouze et al. (2010) Allele frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 G: 14/38 = 0.368 A: 24/38 = 0.632 HapMap Phase 1 (aggr.) G: 440/792 = 0.556 A: 352/792 = 0.445 GSE19189 vs. HapMap aggr. 14 24 440 352 14 24 84 142 14 24 118 50 14 24 85 87 1 14.222 1.625E-04 1.973 1.601E-01 GSE19189 vs. HapMap YRI HapMap YRI G: 153/226 = 0.677 A: 73/226 = 0.323 9.748E-01 GSE19189 vs. HapMap JPT HapMap JPT G: 85/172 = 0.494 A: 87/172 = 0.506 0.001 GSE19189 vs. HapMap CHB HapMap CHB G: 118/168 = 0.702 A: 50/168 = 0.298 4.773E-01 GSE19189 vs. HapMap CEU HapMap CEU G: 84/226 = 0.372 A: 142/226 = 0.628 0.505 14 24 G and p-values are Williams’-corrected 153 73 12.633 3.790E-04 183 e. GSE10506: Nancarrow et al. (2008) Allele frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 G: 15/40 = 0.375 A: 25/40 = 0.625 HapMap Phase 1 (aggr.) G: 440/792 = 0.556 A: 352/792 = 0.445 GSE10506 vs. HapMap aggr. 15 25 440 352 15 25 84 142 15 25 118 50 15 25 85 87 1 14.26 1.592E-04 1.846 1.742E-01 GSE10506 vs. HapMap YRI HapMap YRI G: 153/226 = 0.677 A: 73/226 = 0.323 9.643E-01 GSE10506 vs. HapMap JPT HapMap JPT G: 85/172 = 0.494 A: 87/172 = 0.506 0.002 GSE10506 vs. HapMap CHB HapMap CHB G: 118/168 = 0.702 A: 50/168 = 0.298 2.627E-02 GSE10506 vs. HapMap CEU HapMap CEU G: 84/226 = 0.372 A: 142/226 = 0.628 4.938 15 25 G and p-values are Williams’-corrected 153 73 12.651 3.754E-04 184 f. GSE18799: Popova et al. (2009) Allele frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 G: 22/38 = 0.579 A: 16/38 = 0.421 HapMap Phase 1 (aggr.) G: 440/792 = 0.556 A: 352/792 = 0.445 GSE18799 vs. HapMap aggr. 22 16 440 352 22 16 84 142 22 16 118 50 22 16 85 87 1 2.061 1.511E-01 0.886 3.466E-01 GSE18799 vs. HapMap YRI HapMap YRI G: 153/226 = 0.677 A: 73/226 = 0.323 1.782E-02 GSE18799 vs. HapMap JPT HapMap JPT G: 85/172 = 0.494 A: 87/172 = 0.506 5.614 GSE18799 vs. HapMap CHB HapMap CHB G: 118/168 = 0.702 A: 50/168 = 0.298 7.773E-01 GSE18799 vs. HapMap CEU HapMap CEU G: 84/226 = 0.372 A: 142/226 = 0.628 0.08 22 16 G and p-values are Williams’-corrected 153 73 1.339 2.472E-01 185 g. GSE9003: Stark & Hayward (2007) Allele frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 G: 53/144 = 0.368 A: 91/144 = 0.632 HapMap Phase 1 (aggr.) G: 440/792 = 0.556 A: 352/792 = 0.445 GSE9003 vs. HapMap aggr. 53 91 440 352 53 91 84 142 53 91 118 50 53 91 85 87 1 35.414 2.666E-09 5.069 2.436E-02 GSE9003 vs. HapMap YRI HapMap YRI G: 153/226 = 0.677 A: 73/226 = 0.323 9.436E-01 GSE9003 vs. HapMap JPT HapMap JPT G: 85/172 = 0.494 A: 87/172 = 0.506 0.005 GSE9003 vs. HapMap CHB HapMap CHB G: 118/168 = 0.702 A: 50/168 = 0.298 3.345E-05 GSE9003 vs. HapMap CEU HapMap CEU G: 84/226 = 0.372 A: 142/226 = 0.628 17.211 53 91 G and p-values are Williams’-corrected 153 73 34.16 5.076E-09 186 h. GSE19177: Waddell et al. (2010) Allele frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 G: 31/68 = 0.456 A: 37/68 = 0.544 HapMap Phase 1 (aggr.) G: 440/792 = 0.556 A: 352/792 = 0.445 GSE19177 vs. HapMap aggr. 31 37 440 352 31 37 84 142 31 37 118 50 31 37 85 87 1 12.268 4.608E-04 0.285 5.934E-01 GSE19177 vs. HapMap YRI HapMap YRI G: 153/226 = 0.677 A: 73/226 = 0.323 2.167E-01 GSE19177 vs. HapMap JPT HapMap JPT G: 85/172 = 0.494 A: 87/172 = 0.506 1.526 GSE19177 vs. HapMap CHB HapMap CHB G: 118/168 = 0.702 A: 50/168 = 0.298 1.154E-01 GSE19177 vs. HapMap CEU HapMap CEU G: 84/226 = 0.372 A: 142/226 = 0.628 2.479 31 37 G and p-values are Williams’-corrected 153 73 10.554 1.159E-03 187 4. rs3763959 (LGALS9) a. GSE21168: Castillo et al. (2010) Allele frequencies Castillo, GSE21168 GSE21168 HapMap G1 p-value1 A: 3/8 = 0.375 G: 5/8 = 0.625 HapMap Phase 1 (aggr.) A: 240/790 = 0.304 G: 550/790 = 0.696 GSE21168 vs. HapMap aggr. 3 5 240 550 3 5 101 125 3 5 58 110 3 5 52 118 1 0.028 8.671E-01 0.153 6.957E-01 GSE21168 vs. HapMap YRI HapMap YRI A: 29/226 = 0.128 G: 197/226 = 0.872 6.947E-01 GSE21168 vs. HapMap JPT HapMap JPT A: 52/170 = 0.306 G: 118/170 = 0.694 0.154 GSE21168 vs. HapMap CHB HapMap CHB A: 58/168 = 0.345 G: 110/168 = 0.655 6.801E-01 GSE21168 vs. HapMap CEU HapMap CEU A: 101/226 = 0.447 G: 125/226 = 0.553 0.17 3 5 G and p-values are Williams’-corrected 29 197 2.563 1.094E-01 188 b. GSE16019: Chen M et al. (2009) Allele frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 A: 65/154 = 0.422 G: 89/154 = 0.578 HapMap Phase 1 (aggr.) A: 240/790 = 0.304 G: 550/790 = 0.696 GSE16019 vs. HapMap aggr. 65 89 240 550 65 89 101 125 65 89 101 125 65 89 58 110 1 0.229 6.323E-01 2 1.573E-01 GSE16019 vs. HapMap YRI HapMap YRI A: 29/226 = 0.128 G: 197/226 = 0.872 6.323E-01 GSE16019 vs. HapMap JPT HapMap JPT A: 58/168 = 0.345 G: 110/168 = 0.655 0.229 GSE16019 vs. HapMap CHB HapMap CHB A: 101/226 = 0.447 G: 125/226 = 0.553 4.865E-03 GSE16019 vs. HapMap CEU HapMap CEU A: 101/226 = 0.447 G: 125/226 = 0.553 7.929 65 89 G and p-values are Williams’-corrected 29 197 41.98 9.221E-11 189 c. GSE13282: Gordan et al. (2008) Allele frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 A: 22/42 = 0.524 G: 20/42 = 0.476 HapMap Phase 1 (aggr.) A: 240/790 = 0.304 G: 550/790 = 0.696 GSE13282 vs. HapMap aggr. 22 20 240 550 22 20 101 125 22 20 58 110 22 20 52 118 1 4.379 3.638E-02 6.684 9.728E-03 GSE13282 vs. HapMap YRI HapMap YRI A: 29/226 = 0.128 G: 197/226 = 0.872 3.620E-01 GSE13282 vs. HapMap JPT HapMap JPT A: 52/170 = 0.306 G: 118/170 = 0.694 0.831 GSE13282 vs. HapMap CHB HapMap CHB A: 58/168 = 0.345 G: 110/168 = 0.655 4.242E-03 GSE13282 vs. HapMap CEU HapMap CEU A: 101/226 = 0.447 G: 125/226 = 0.553 8.177 22 20 G and p-values are Williams’-corrected 29 197 28.875 7.720E-08 190 d. GSE19189: Letouze et al. (2010) Allele frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 A: 15/38 = 0.395 G: 23/38 = 0.605 HapMap Phase 1 (aggr.) A: 240/790 = 0.304 G: 550/790 = 0.696 GSE19189 vs. HapMap aggr. 15 23 240 550 15 23 101 125 15 23 58 110 15 23 52 118 1 0.323 5.698E-01 1.074 3.000E-01 GSE19189 vs. HapMap YRI HapMap YRI A: 29/226 = 0.128 G: 197/226 = 0.872 5.502E-01 GSE19189 vs. HapMap JPT HapMap JPT A: 52/170 = 0.306 G: 118/170 = 0.694 0.357 GSE19189 vs. HapMap CHB HapMap CHB A: 58/168 = 0.345 G: 110/168 = 0.655 2.493E-01 GSE19189 vs. HapMap CEU HapMap CEU A: 101/226 = 0.447 G: 125/226 = 0.553 1.327 15 23 G and p-values are Williams’-corrected 29 197 13.346 2.590E-04 191 e. GSE10506: Nancarrow et al. (2008) Allele frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 A: 10/32 = 0.313 G: 22/32 = 0.688 HapMap Phase 1 (aggr.) A: 240/790 = 0.304 G: 550/790 = 0.696 GSE10506 vs. HapMap aggr. 10 22 240 550 10 22 101 125 10 22 58 110 10 22 52 118 1 0.128 7.205E-01 0.005 9.436E-01 GSE10506 vs. HapMap YRI HapMap YRI A: 29/226 = 0.128 G: 197/226 = 0.872 1.481E-01 GSE10506 vs. HapMap JPT HapMap JPT A: 52/170 = 0.306 G: 118/170 = 0.694 2.092 GSE10506 vs. HapMap CHB HapMap CHB A: 58/168 = 0.345 G: 110/168 = 0.655 9.165E-01 GSE10506 vs. HapMap CEU HapMap CEU A: 101/226 = 0.447 G: 125/226 = 0.553 0.011 10 22 G and p-values are Williams’-corrected 29 197 5.994 1.435E-02 192 f. GSE18799: Popova et al. (2009) Allele frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 A: 11/36 = 0.306 G: 25/36 = 0.694 HapMap Phase 1 (aggr.) A: 240/790 = 0.304 G: 550/790 = 0.696 GSE18799 vs. HapMap aggr. 11 25 240 550 11 25 101 125 11 25 58 110 11 25 52 118 1 0.208 6.483E-01 <0.001 <1 GSE18799 vs. HapMap YRI HapMap YRI A: 29/226 = 0.128 G: 197/226 = 0.872 1.086E-01 GSE18799 vs. HapMap JPT HapMap JPT A: 52/170 = 0.306 G: 118/170 = 0.694 2.575 GSE18799 vs. HapMap CHB HapMap CHB A: 58/168 = 0.345 G: 110/168 = 0.655 <1 GSE18799 vs. HapMap CEU HapMap CEU A: 101/226 = 0.447 G: 125/226 = 0.553 <0.001 11 25 G and p-values are Williams’-corrected 29 197 6.203 1.275E-02 193 g. GSE9003: Stark & Hayward (2007) Allele frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 A: 62/144 = 0.430 G: 72/144 = 0.500 HapMap Phase 1 (aggr.) A: 240/790 = 0.304 G: 550/790 = 0.696 GSE9003 vs. HapMap aggr. 62 72 240 550 62 72 101 125 62 72 58 110 62 72 52 118 1 4.267 3.886E-02 7.81 5.196E-03 GSE9003 vs. HapMap YRI HapMap YRI A: 29/226 = 0.128 G: 197/226 = 0.872 7.706E-01 GSE9003 vs. HapMap JPT HapMap JPT A: 52/170 = 0.306 G: 118/170 = 0.694 0.085 GSE9003 vs. HapMap CHB HapMap CHB A: 58/168 = 0.345 G: 110/168 = 0.655 4.035E-04 GSE9003 vs. HapMap CEU HapMap CEU A: 101/226 = 0.447 G: 125/226 = 0.553 12.516 62 72 G and p-values are Williams’-corrected 29 197 48.535 3.244E-12 194 h. GSE19177: Waddell et al. (2010) Allele frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 A: 26/62 = 0.419 G: 36/62 = 0.581 HapMap Phase 1 (aggr.) A: 240/790 = 0.304 G: 550/790 = 0.696 GSE19177 vs. HapMap aggr. 26 36 240 550 26 36 101 125 26 36 58 110 26 36 52 118 1 1.051 3.053E-01 2.541 1.109E-01 GSE19177 vs. HapMap YRI HapMap YRI A: 29/226 = 0.128 G: 197/226 = 0.872 6.995E-01 GSE19177 vs. HapMap JPT HapMap JPT A: 52/170 = 0.306 G: 118/170 = 0.694 0.149 GSE19177 vs. HapMap CHB HapMap CHB A: 58/168 = 0.345 G: 110/168 = 0.655 6.607E-02 GSE19177 vs. HapMap CEU HapMap CEU A: 101/226 = 0.447 G: 125/226 = 0.553 3.378 26 36 G and p-values are Williams’-corrected 29 197 22.99 1.628E-06 195 5. rs4820294 (LGALS1) a. GSE21168: Castillo et al. (2010) Allele frequencies Castillo, GSE21168 GSE21168 HapMap G1 p-value1 G: 1/6 = 0.167 A: 5/6 = 0.833 HapMap Phase 1 (aggr.) G: 569/784 = 0.726 A: 215/784 = 0.274 GSE21168 vs. HapMap aggr. 1 5 569 215 1 5 150 74 1 5 132 36 1 5 116 52 1 8.894 2.861E-03 6.189 1.285E-02 GSE21168 vs. HapMap YRI HapMap YRI G: 171/224 = 0.763 A: 53/224 = 0.237 1.654E-02 GSE21168 vs. HapMap JPT HapMap JPT G: 116/168 = 0.690 A: 52/168 = 0.310 5.745 GSE21168 vs. HapMap CHB HapMap CHB G: 132/168 = 0.786 A: 36/168 = 0.214 6.915E-03 GSE21168 vs. HapMap CEU HapMap CEU G: 150/224 = 0.670 A: 74/224 = 0.330 7.295 1 5 G and p-values are Williams’-corrected 171 53 8.256 4.062E-03 196 b. GSE16019: Chen M et al. (2009) Allele frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 G: 97/140 = 0.693 A: 43/140 = 0.307 HapMap Phase 1 (aggr.) G: 569/784 = 0.726 A: 215/784 = 0.274 GSE16019 vs. HapMap aggr. 97 43 569 215 97 43 150 74 97 43 132 36 97 43 116 52 1 3.418 6.449E-02 0.002 9.643E-01 GSE16019 vs. HapMap YRI HapMap YRI G: 171/224 = 0.763 A: 53/224 = 0.237 6.444E-01 GSE16019 vs. HapMap JPT HapMap JPT G: 116/168 = 0.690 A: 52/168 = 0.310 0.213 GSE16019 vs. HapMap CHB HapMap CHB G: 132/168 = 0.786 A: 36/168 = 0.214 4.288E-01 GSE16019 vs. HapMap CEU HapMap CEU G: 150/224 = 0.670 A: 74/224 = 0.330 0.626 97 43 G and p-values are Williams’-corrected 171 53 2.171 1.406E-01 197 c. GSE13282: Gordan et al. (2008) Allele frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 G: 21/30 = 0.7 A: 9/30 = 0.3 HapMap Phase 1 (aggr.) G: 569/784 = 0.726 A: 215/784 = 0.274 GSE13282 vs. HapMap aggr. 21 9 569 215 21 9 150 74 21 9 132 36 21 9 116 52 1 0.927 3.356E-01 0.011 9.165E-01 GSE13282 vs. HapMap YRI HapMap YRI G: 171/224 = 0.763 A: 53/224 = 0.237 6.005E-01 GSE13282 vs. HapMap JPT HapMap JPT G: 116/168 = 0.690 A: 52/168 = 0.310 0.11 GSE13282 vs. HapMap CHB HapMap CHB G: 132/168 = 0.786 A: 36/168 = 0.214 7.604E-01 GSE13282 vs. HapMap CEU HapMap CEU G: 150/224 = 0.670 A: 74/224 = 0.330 0.093 21 9 G and p-values are Williams’-corrected 171 53 0.541 4.620E-01 198 d. GSE19189: Letouze et al. (2010) Allele frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 G: 24/40 = 0.600 A: 16/40 = 0.400 HapMap Phase 1 (aggr.) G: 569/784 = 0.726 A: 215/784 = 0.274 GSE19189 vs. HapMap aggr. 24 16 569 215 24 16 150 74 24 16 132 36 24 16 116 52 1 5.41 2.002E-02 1.154 2.827E-01 GSE19189 vs. HapMap YRI HapMap YRI G: 171/224 = 0.763 A: 53/224 = 0.237 4.001E-01 GSE19189 vs. HapMap JPT HapMap JPT G: 116/168 = 0.690 A: 52/168 = 0.310 0.708 GSE19189 vs. HapMap CHB HapMap CHB G: 132/168 = 0.786 A: 36/168 = 0.214 9.744E-02 GSE19189 vs. HapMap CEU HapMap CEU G: 150/224 = 0.670 A: 74/224 = 0.330 2.747 24 16 G and p-values are Williams’-corrected 171 53 4.289 3.836E-02 199 e. GSE10506: Nancarrow et al. (2008) Allele frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 G: 31/44 = 0.705 A: 13/44 = 0.295 HapMap Phase 1 (aggr.) G: 569/784 = 0.726 A: 215/784 = 0.274 GSE10506 vs. HapMap aggr. 31 13 569 215 31 13 150 74 31 13 132 36 31 13 116 52 1 1.218 2.698E-01 0.033 8.559E-01 GSE10506 vs. HapMap YRI HapMap YRI G: 171/224 = 0.763 A: 53/224 = 0.237 6.515E-01 GSE10506 vs. HapMap JPT HapMap JPT G: 116/168 = 0.690 A: 52/168 = 0.310 0.204 GSE10506 vs. HapMap CHB HapMap CHB G: 132/168 = 0.786 A: 36/168 = 0.214 7.616E-01 GSE10506 vs. HapMap CEU HapMap CEU G: 150/224 = 0.670 A: 74/224 = 0.330 0.092 31 13 G and p-values are Williams’-corrected 171 53 0.653 4.190E-01 200 f. GSE18799: Popova et al. (2009) Allele frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 G: 20/34 = 0.588 A: 14/34 = 0.411 HapMap Phase 1 (aggr.) G: 569/784 = 0.726 A: 215/784 = 0.274 GSE18799 vs. HapMap aggr. 20 14 569 215 20 14 150 74 20 14 132 36 20 14 116 52 1 5.312 2.118E-02 1.281 2.577E-01 GSE18799 vs. HapMap YRI HapMap YRI G: 171/224 = 0.763 A: 53/224 = 0.237 3.605E-01 GSE18799 vs. HapMap JPT HapMap JPT G: 116/168 = 0.690 A: 52/168 = 0.310 0.836 GSE18799 vs. HapMap CHB HapMap CHB G: 132/168 = 0.786 A: 36/168 = 0.214 9.491E-02 GSE18799 vs. HapMap CEU HapMap CEU G: 150/224 = 0.670 A: 74/224 = 0.330 2.789 20 14 G and p-values are Williams’-corrected 171 53 4.254 3.916E-02 201 g. GSE9003: Stark & Hayward (2007) Allele frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 G: 86/112 = 0.769 A: 26/112 = 0.232 HapMap Phase 1 (aggr.) G: 569/784 = 0.726 A: 215/784 = 0.274 GSE9003 vs. HapMap aggr. 86 26 569 215 86 26 150 74 86 26 132 36 86 26 116 52 1 0.123 7.258E-01 2.016 1.556E-01 GSE9003 vs. HapMap YRI HapMap YRI G: 171/224 = 0.763 A: 53/224 = 0.237 6.089E-02 GSE9003 vs. HapMap JPT HapMap JPT G: 116/168 = 0.690 A: 52/168 = 0.310 3.513 GSE9003 vs. HapMap CHB HapMap CHB G: 132/168 = 0.786 A: 36/168 = 0.214 3.425E-01 GSE9003 vs. HapMap CEU HapMap CEU G: 150/224 = 0.670 A: 74/224 = 0.330 0.901 86 26 G and p-values are Williams’-corrected 171 53 0.008 9.287E-01 202 h. GSE19177: Waddell et al. (2010) Allele frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 G: 34/54 = 0.630 A: 20/54 = 0.370 HapMap Phase 1 (aggr.) G: 569/784 = 0.726 A: 215/784 = 0.274 GSE19177 vs. HapMap aggr. 34 20 569 215 34 20 150 74 34 20 132 36 34 20 116 52 1 4.931 2.638E-02 0.672 4.124E-01 GSE19177 vs. HapMap YRI HapMap YRI G: 171/224 = 0.763 A: 53/224 = 0.237 5.808E-01 GSE19177 vs. HapMap JPT HapMap JPT G: 116/168 = 0.690 A: 52/168 = 0.310 0.305 GSE19177 vs. HapMap CHB HapMap CHB G: 132/168 = 0.786 A: 36/168 = 0.214 1.407E-01 GSE19177 vs. HapMap CEU HapMap CEU G: 150/224 = 0.670 A: 74/224 = 0.330 2.17 34 20 G and p-values are Williams’-corrected 171 53 3.759 5.252E-02 203 6. rs10403583 (LGALS4) a. GSE21168: Castillo et al. (2010) Allele frequencies Castillo, GSE21168 GSE21168 HapMap G1 p-value1 A: 1/6 = 0.167 G: 5/6 = 0.833 HapMap Phase 1 (aggr.) A: 132/792 = 0.167 G: 660/792 = 0.833 GSE21168 vs. HapMap aggr. 1 5 132 660 1 5 37 109 1 5 18 150 1 5 13 159 1 0.148 7.005E-01 0.384 5.355E-01 GSE21168 vs. HapMap YRI HapMap YRI A: 64/226 = 0.283 G: 162/226 = 0.717 6.353E-01 GSE21168 vs. HapMap JPT HapMap JPT A: 13/172 = 0.076 G: 159/172 = 0.924 0.225 GSE21168 vs. HapMap CHB HapMap CHB A: 18/168 = 0.107 G: 150/168 = 0.893 1 GSE21168 vs. HapMap CEU HapMap CEU A: 37/226 = 0.164 G: 109/226 = 0.836 0 1 5 G and p-values are Williams’-corrected 64 162 0.393 5.307E-01 204 b. GSE16019: Chen M et al. (2009) Allele frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 A: 31/158 = 0.196 G: 127/158 = 0.804 HapMap Phase 1 (aggr.) A: 132/792 = 0.167 G: 660/792 = 0.833 GSE16019 vs. HapMap aggr. 31 127 132 660 31 127 37 109 31 127 18 150 31 127 13 159 1 5.043 2.473E-02 10.456 1.223E-03 GSE16019 vs. HapMap YRI HapMap YRI A: 64/226 = 0.283 G: 162/226 = 0.717 2.334E-01 GSE16019 vs. HapMap JPT HapMap JPT A: 13/172 = 0.076 G: 159/172 = 0.924 1.42 GSE16019 vs. HapMap CHB HapMap CHB A: 18/168 = 0.107 G: 150/168 = 0.893 3.768E-01 GSE16019 vs. HapMap CEU HapMap CEU A: 37/226 = 0.164 G: 109/226 = 0.836 0.781 31 127 G and p-values are Williams’-corrected 64 162 3.826 5.046E-02 205 c. GSE13282: Gordan et al. (2008) Allele frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 A: 8/42 = 0.19 G: 34/42 = 0.81 HapMap Phase 1 (aggr.) A: 132/792 = 0.167 G: 660/792 = 0.833 GSE13282 vs. HapMap aggr. 8 34 132 660 8 34 37 109 8 34 37 109 8 34 18 150 1 0.723 3.952E-01 1.894 1.688E-01 GSE13282 vs. HapMap YRI HapMap YRI A: 64/226 = 0.283 G: 162/226 = 0.717 3.952E-01 GSE13282 vs. HapMap JPT HapMap JPT A: 18/168 = 0.107 G: 150/168 = 0.893 0.723 GSE13282 vs. HapMap CHB HapMap CHB A: 37/226 = 0.164 G: 109/226 = 0.836 6.957E-01 GSE13282 vs. HapMap CEU HapMap CEU A: 37/226 = 0.164 G: 109/226 = 0.836 0.153 8 34 G and p-values are Williams’-corrected 64 162 1.617 2.035E-01 206 d. GSE19189: Letouze et al. (2010) Allele frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 A: 3/36 = 0.083 G: 33/36 = 0.917 HapMap Phase 1 (aggr.) A: 132/792 = 0.167 G: 660/792 = 0.833 GSE19189 vs. HapMap aggr. 3 33 132 660 3 33 37 109 3 33 37 109 3 33 13 159 1 5.618 1.778E-02 0.024 8.769E-01 GSE19189 vs. HapMap YRI HapMap YRI A: 64/226 = 0.283 G: 162/226 = 0.717 1.778E-02 GSE19189 vs. HapMap JPT HapMap JPT A: 13/172 = 0.076 G: 159/172 = 0.924 5.618 GSE19189 vs. HapMap CHB HapMap CHB A: 37/226 = 0.164 G: 109/226 = 0.836 1.579E-01 GSE19189 vs. HapMap CEU HapMap CEU A: 37/226 = 0.164 G: 109/226 = 0.836 1.994 3 33 G and p-values are Williams’-corrected 64 162 7.742 5.395E-03 207 e. GSE10506: Nancarrow et al. (2008) Allele frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 A: 6/42 = 0.143 G: 36/42 = 0.857 HapMap Phase 1 (aggr.) A: 132/792 = 0.167 G: 660/792 = 0.833 GSE10506 vs. HapMap aggr. 6 36 132 660 6 36 37 109 6 36 18 150 6 36 13 159 1 0.389 8.232E-01 1.616 4.457E-01 GSE10506 vs. HapMap YRI HapMap YRI A: 64/226 = 0.283 G: 162/226 = 0.717 3.015E-01 GSE10506 vs. HapMap JPT HapMap JPT A: 13/172 = 0.076 G: 159/172 = 0.924 2.398 GSE10506 vs. HapMap CHB HapMap CHB A: 18/168 = 0.107 G: 150/168 = 0.893 6.837E-01 GSE10506 vs. HapMap CEU HapMap CEU A: 37/226 = 0.164 G: 109/226 = 0.836 0.166 6 36 G and p-values are Williams’-corrected 64 162 3.946 1.390E-01 208 f. GSE18799: Popova et al. (2009) Allele frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 A: 12/42 = 0.286 G: 30/42 = 0.714 HapMap Phase 1 (aggr.) A: 132/792 = 0.167 G: 660/792 = 0.833 GSE18799 vs. HapMap aggr. 12 30 132 660 12 30 37 109 12 30 18 150 12 30 13 159 1 7.366 6.647E-03 11.505 6.941E-04 GSE18799 vs. HapMap YRI HapMap YRI A: 64/226 = 0.283 G: 162/226 = 0.717 6.792E-01 GSE18799 vs. HapMap JPT HapMap JPT A: 13/172 = 0.076 G: 159/172 = 0.924 0.171 GSE18799 vs. HapMap CHB HapMap CHB A: 18/168 = 0.107 G: 150/168 = 0.893 6.551E-02 GSE18799 vs. HapMap CEU HapMap CEU A: 37/226 = 0.164 G: 109/226 = 0.836 3.392 12 30 G and p-values are Williams’-corrected 64 162 0.001 9.748E-01 209 g. GSE9003: Stark & Hayward (2007) Allele frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 A: 21/144 = 0.146 G: 123/144 = 0.854 HapMap Phase 1 (aggr.) A: 132/792 = 0.167 G: 660/792 = 0.833 GSE9003 vs. HapMap aggr. 21 123 132 660 21 123 37 109 21 123 18 150 21 123 13 159 1 1.044 3.069E-01 3.964 4.648E-02 GSE9003 vs. HapMap YRI HapMap YRI A: 64/226 = 0.283 G: 162/226 = 0.717 2.188E-02 GSE9003 vs. HapMap JPT HapMap JPT A: 13/172 = 0.076 G: 159/172 = 0.924 5.255 GSE9003 vs. HapMap CHB HapMap CHB A: 18/168 = 0.107 G: 150/168 = 0.893 5.307E-01 GSE9003 vs. HapMap CEU HapMap CEU A: 37/226 = 0.164 G: 109/226 = 0.836 0.393 21 123 G and p-values are Williams’-corrected 64 162 9.754 1.789E-03 210 h. GSE19177: Waddell et al. (2010) Allele frequencies Waddell, GSE19177 GSE19188 HapMap G1 p-value1 A: 6/64 = 0.094 G: 58/64 = 0.906 HapMap Phase 1 (aggr.) A: 132/792 = 0.167 G: 660/792 = 0.833 GSE19177 vs. HapMap aggr. 6 58 132 660 HapMap CHB A: 18/168 = 0.107 G: 150/168 = 0.893 6 58 6 58 37 109 18 150 6 58 13 159 1 5.587E-03 GSE19177 vs. HapMap CHB 0.089 9.248E-01 0.196 6.580E-01 GSE19177 vs. HapMap YRI HapMap YRI A: 64/226 = 0.283 G: 162/226 = 0.717 7.679 GSE19177 vs. HapMap JPT HapMap JPT A: 13/172 = 0.076 G: 159/172 = 0.924 1.071E-01 GSE19177 vs. HapMap CEU HapMap CEU A: 37/226 = 0.164 G: 109/226 = 0.836 2.597 6 58 G and p-values are Williams’-corrected 64 162 11.289 7.797E-04 211 7. rs10489789 (LGALS8) a. GSE16019: Chen M et al. (2009) Allele frequencies Chen, GSE16019 GSE16019 HapMap G1 p-value1 G: 136/150 = 0.907 A: 14/150 = 0.093 HapMap Phase 1 (aggr.) G: 670/768 = 0.872 A: 98/768 = 0.128 GSE16019 vs. HapMap aggr. 136 14 670 98 136 14 193 29 136 14 158 2 136 14 164 2 1 11.101 8.628E-04 11.567 6.713E-04 GSE16019 vs. HapMap YRI HapMap YRI G: 155/220 = 0.295 A: 65/220 = 0.705 2.672E-01 GSE16019 vs. HapMap JPT HapMap JPT G: 164/168 = 0.988 A: 2/166 = 0.012 1.231 GSE16019 vs. HapMap CHB HapMap CHB G: 158/160 = 0.988 A: 2/160 = 0.013 2.288E-01 GSE16019 vs. HapMap CEU HapMap CEU G: 193/222 = 0.869 A: 29/222 = 0.131 1.448 136 14 G and p-values are Williams’-corrected 155 65 23.461 1.275E-06 212 b. GSE13282: Gordan et al. (2008) Allele frequencies Gordan, GSE13282 GSE13282 HapMap G1 p-value1 G: 36/42 = 0.857 A: 6/42 = 0.143 HapMap Phase 1 (aggr.) G: 670/768 = 0.872 A: 98/768 = 0.128 X vs. HapMap aggr. 36 6 670 98 36 6 193 29 36 6 158 2 36 6 164 2 1 10.297 1.332E-03 10.569 1.150E-03 X vs. HapMap YRI HapMap YRI G: 155/220 = 0.295 A: 65/220 = 0.705 8.339E-01 X vs. HapMap JPT HapMap JPT G: 164/168 = 0.988 A: 2/166 = 0.012 0.044 X vs. HapMap CHB HapMap CHB G: 158/160 = 0.988 A: 2/160 = 0.013 7.800E-01 X vs. HapMap CEU HapMap CEU G: 193/222 = 0.869 A: 29/222 = 0.131 0.078 36 6 G and p-values are Williams’-corrected 155 65 4.554 3.284E-02 213 c. GSE19189: Letouze et al. (2010) Allele frequencies Letouze, GSE19189 GSE19189 HapMap G1 p-value1 G: 29/40 = 0.725 A: 11/40 = 0.275 HapMap Phase 1 (aggr.) G: 670/768 = 0.872 A: 98/768 = 0.128 GSE19189 vs. HapMap aggr. 29 11 670 98 29 11 193 29 HapMap JPT G: 164/168 = 0.988 A: 2/166 = 0.012 29 11 29 11 158 2 164 2 1 3.125E-02 25.896 3.603E-07 GSE19189 vs. HapMap JPT 26.457 2.695E-07 GSE19189 vs. HapMap YRI HapMap YRI G: 155/220 = 0.295 A: 65/220 = 0.705 4.639 GSE19189 vs. HapMap CHB HapMap CHB G: 158/160 = 0.988 A: 2/160 = 0.013 1.795E-02 GSE19189 vs. HapMap CEU HapMap CEU G: 193/222 = 0.869 A: 29/222 = 0.131 5.601 29 11 G and p-values are Williams’-corrected 155 65 0.068 7.943E-01 214 d. GSE10506: Nancarrow et al. (2008) Allele frequencies Nancarrow, GSE10506 GSE10506 HapMap G1 p-value1 G: 38/44 = 0.864 A: 4/44 = 0.091 HapMap Phase 1 (aggr.) G: 670/768 = 0.872 A: 98/768 = 0.128 GSE10506 vs. HapMap aggr. 38 4 670 98 38 4 193 29 38 4 158 2 38 4 164 2 1 5.344 2.079E-02 5.509 1.892E-02 GSE10506 vs. HapMap YRI HapMap YRI G: 155/220 = 0.295 A: 65/220 = 0.705 5.189E-01 GSE10506 vs. HapMap JPT HapMap JPT G: 164/168 = 0.988 A: 2/166 = 0.012 0.416 GSE10506 vs. HapMap CHB HapMap CHB G: 158/160 = 0.988 A: 2/160 = 0.013 5.297E-01 GSE10506 vs. HapMap CEU HapMap CEU G: 193/222 = 0.869 A: 29/222 = 0.131 0.395 38 4 G and p-values are Williams’-corrected 155 65 8.482 3.587E-03 215 e. GSE18799: Popova et al. (2009) Allele frequencies Popova, GSE18799 GSE18799 HapMap G1 p-value1 G: 34/42 = 0.810 A: 8/42 = 0.190 HapMap Phase 1 (aggr.) G: 670/768 = 0.872 A: 98/768 = 0.128 GSE18799 vs. HapMap aggr. 34 8 670 98 34 8 193 29 34 8 158 2 34 8 164 2 1 15.863 6.810E-05 16.246 5.563E-05 GSE18799 vs. HapMap YRI HapMap YRI G: 155/220 = 0.295 A: 65/220 = 0.705 3.297E-01 GSE18799 vs. HapMap JPT HapMap JPT G: 164/168 = 0.988 A: 2/166 = 0.012 0.95 GSE18799 vs. HapMap CHB HapMap CHB G: 158/160 = 0.988 A: 2/160 = 0.013 2.715E-01 GSE18799 vs. HapMap CEU HapMap CEU G: 193/222 = 0.869 A: 29/222 = 0.131 1.209 34 8 G and p-values are Williams’-corrected 155 65 2.027 1.545E-01 216 f. GSE9003: Stark & Hayward (2007) Allele frequencies Stark, GSE9003 GSE9003 HapMap G1 p-value1 G: 130/146 = 0.890 A: 16/146 = 0.110 HapMap Phase 1 (aggr.) G: 670/768 = 0.872 A: 98/768 = 0.128 GSE9003 vs. HapMap aggr. 130 16 670 98 130 16 193 29 130 16 158 2 130 16 164 2 1 14.087 1.745E-04 14.643 1.299E-04 GSE9003 vs. HapMap YRI HapMap YRI G: 155/220 = 0.295 A: 65/220 = 0.705 5.468E-01 GSE9003 vs. HapMap JPT HapMap JPT G: 164/168 = 0.988 A: 2/166 = 0.012 0.363 GSE9003 vs. HapMap CHB HapMap CHB G: 158/160 = 0.988 A: 2/160 = 0.013 5.419E-01 GSE9003 vs. HapMap CEU HapMap CEU G: 193/222 = 0.869 A: 29/222 = 0.131 0.372 130 16 G and p-values are Williams’-corrected 155 65 18.783 1.465E-05 217 g. GSE19177: Waddell et al. (2010) Allele frequencies Waddell, GSE19177 GSE19177 HapMap G1 p-value1 G: 48/58 = 0.828 A: 10/58 = 0.172 HapMap Phase 1 (aggr.) G: 670/768 = 0.872 A: 98/768 = 0.128 GSE19177 vs. HapMap aggr. 48 10 670 98 48 10 193 29 48 10 158 2 48 10 164 2 1 17.16 3.436E-05 17.657 2.645E-05 GSE19177 vs. HapMap YRI HapMap YRI G: 155/220 = 0.295 A: 65/220 = 0.705 4.266E-01 GSE19177 vs. HapMap JPT HapMap JPT G: 164/168 = 0.988 A: 2/166 = 0.012 0.632 GSE19177 vs. HapMap CHB HapMap CHB G: 158/160 = 0.988 A: 2/160 = 0.013 3.471E-01 GSE19177 vs. HapMap CEU HapMap CEU G: 193/222 = 0.869 A: 29/222 = 0.131 0.884 48 10 G and p-values are Williams’-corrected 155 65 3.763 5.240E-02 218 APPENDIX D SNP genotypes in the GSE11976 dataset Galectin upstream SNP genotypes for GSE11976 (Staaf et al., 2008), HCC1395 breast carcinoma cell line. Gene SNP Location Alleles (ref/other) LGALS1 LGALS1 LGALS2 LGALS4 LGALS8 (long) LGALS9 CLC/LGALS10 rs4820294 rs929039 rs2235338 rs10403583 rs10489789 rs3763959 rs428007 chr22:36400989 chr22:36401457 chr22:36295826 chr19:43983611 chr1:234746800 chr17:22981461 chr19:44913207 G/A T/C G/A A/G G/A A/G C/T 219 APPENDIX E G-test results for Hardy-Weinberg equilibrium 1. rs428007 (CLC/LGALS10) Genotype frequencies GSE16019, Chen T/T: 18/77 = 0.234 T/C: 30/77 = 0.390 C/C: 29/77 = 0.377 Observed Expected G1 p-value1 18 30 29 14.143 37.714 25.143 3.201 0.202 1 12 8 2.333 9.333 9.333 1.813 0.404 1 11 8 2.113 8.775 9.113 1.393 0.498 2 11 7 2.813 9.375 7.813 0.596 0.742 3 9 9 2.679 9.643 8.679 0.09 0.956 7 25 41 5.209 28.582 39.209 1.095 0.578 6 11 14 4.266 14.468 12.267 1.729 0.421 GSE13282, Gordan T/T: 1/21 = 0.048 T/C: 12/21 = 0.571 C/C: 8/21 = 0..381 GSE19189, Letouze T/T: 1/20 = 0.050 T/C: 11/20 = 0.550 C/C: 8/20 = 0.400 GSE10506, Nancarrow T/T: 2/20 = 0.100 T/C: 11/20 = 0.550 C/C: 7/20 = 0.350 GSE18799, Popova T/T: 3/21 = 0.143 T/C: 9/21 = 0.429 C/C: 9/21 = 0.429 GSE9003, Stark T/T: 7/73 = 0.096 T/C: 25/73 = 0.342 C/C: 41/73 = 0.562 GSE19177, Waddell T/T: 6/31 = 0.164 T/C: 11/31 = 0.355 C/C: 14/31 = 0.452 1 G and p-values are Williams’-corrected 220 2. rs929039 (LGALS1) Genotype frequencies GSE16019, Chen T/T: 34/75 = 0.453 T/C: 34/75 = 0.453 C/C: 7/75 = 0.093 Observed Expected G1 p-value1 34 34 7 34.680 32.640 7.680 0.13 0.937 6 10 2 6.722 8.556 2.722 0.504 0.777 7 10 2 7.579 8.842 2.579 0.332 0.847 13 4 3 11.250 7.500 1.250 3.855 0.146 7 8 4 6.368 9.263 3.368 0.341 0.843 30 28 6 30.250 27.500 6.250 0.021 0.99 13 11 8 10.695 15.609 5.695 2.754 0.252 GSE13282, Gordan T/T: 6/18 = 0.333 T/C: 10/18 = 0.556 C/C: 2/18 = 0.111 GSE19189, Letouze T/T: 7/19 = 0.368 T/C: 10/19 = 0.526 C/C: 2/19 = 0.105 GSE10506, Nancarrow T/T: 13/20 = 0.65 T/C: 4/20 = 0.20 C/C: 3/20 = 0.15 GSE18799, Popova T/T: 7/19 = 0.369 T/C: 8/19 = 0.421 C/C: 4/19 = 0.211 GSE9003, Stark T/T: 30/64 = 0.469 T/C: 28/64 = 0.438 C/C: 6/64 = 0.094 GSE19177, Waddell T/T: 13/32 = 0.406 T/C: 11/32 = 0.344 C/C: 8/32 = 0.25 1 G and p-values are Williams’-corrected 221 3. rs2235338 (LGALS2) Genotype frequencies GSE21168, Castillo A/A: 0/4 = 0.000 A/G: 3/4 = 0.750 G/G: 1/4 = 0.250 Observed Expected G1 p-value1 0 3 1 1.563 1.875 0.563 1.652 0.438 14 37 26 14.329 37.342 24.329 0.017 0.991 8 6 7 3.613 9.775 6.613 3.835 0.147 4 7 8 2.579 8.842 7.579 0.994 0.608 3 4 9 2.813 9.375 7.813 2.605 0.272 4 3 11 6.368 9.263 3.368 6.306 0.043 20 22 25 9.753 33.493 28.753 7.791 0.02 10 6 15 7.066 16.868 10.066 11.649 0.002954 GSE16019, Chen A/A: 14/77 = 0.182 A/G: 37/77 = 0.481 G/G: 26/77 = 0.338 GSE13282, Gordan A/A: 8/21 = 0.381 A/G: 6/21 = 0.286 G/G: 7/21 = 0.333 GSE19189, Letouze A/A: 4/19 = 0.211 A/G: 7/19 = 0.368 G/G: 8/19 = 0.421 GSE10506, Nancarrow A/A: 3/16 = 0.188 A/G: 4/16 = 0.25 G/G: 9/16 = 0.563 GSE18799, Popova A/A: 4/18 = 0.222 A/G: 3/18 = 0.167 G/G: 11/18 = 0.611 GSE9003, Stark A/A: 20/67 = 0.299 A/G: 22/67 = 0.328 G/G: 25/67 = 0.373 GSE19177, Waddell A/A: 10/31 = 0.323 A/G: 6/31 = 0.194 G/G: 15/31 = 0.484 1 G and p-values are Williams’-corrected 222 4. rs3763959 (LGALS9) Genotype frequencies GSE21168, Castillo A/A: 0/4 = 0.000 A/G: 3/4 = 0.750 G/G: 1/4 = 0.250 Observed Expected G1 p-value1 0 3 1 0.563 1.875 1.563 1.652 0.438 14 37 26 13.718 37.565 25.718 0.017 0.991 8 6 7 5.762 10.476 4.762 3.835 0.147 4 7 8 2.961 9.079 6.961 0.994 0.608 3 4 9 1.563 6.875 7.563 2.605 0.272 4 3 11 1.681 7.639 8.681 6.306 0.043 20 22 25 14.343 33.313 19.343 7.791 0.02 10 6 15 5.452 15.097 10.452 11.649 0.002954 GSE16019, Chen A/A: 14/77 = 0.182 A/G: 37/77 = 0.481 G/G: 26/77 = 0.338 GSE13282, Gordan A/A: 8/21 = 0.381 A/G: 6/21 = 0.286 G/G: 7/21 = 0.333 GSE19189, Letouze A/A: 4/19 = 0.211 A/G: 7/19 = 0.368 G/G: 8/19 = 0.421 GSE10506, Nancarrow A/A: 3/16 = 0.188 A/G: 4/16 = 0.25 G/G: 9/16 = 0.563 GSE18799, Popova A/A: 4/18 = 0.222 A/G: 3/18 = 0.167 G/G: 11/18 = 0.611 GSE9003, Stark A/A: 20/67 = 0.299 A/G: 22/67 = 0.328 G/G: 25/67 = 0.373 GSE19177, Waddell A/A: 10/31 = 0.323 A/G: 6/31 = 0.194 G/G: 15/31 = 0.484 1 G and p-values are Williams’-corrected 223 5. rs4820294 (LGALS1) Genotype frequencies GSE21168, Castillo G/G: 0/3 = 0.000 A/G: 1/3 = 0.333 A/A: 2/3 = 0.667 Observed Expected G1 p-value1 0 1 2 0.083 0.833 2.083 0.165 0.921 34 29 7 33.604 29.793 6.604 0.049 0.976 7 7 1 7.350 6.300 1.350 0.184 0.912 7 10 3 7.200 9.600 3.200 0.035 0.983 14 3 5 10.920 9.159 1.920 9.539 0.008486 6 8 3 5.882 8.235 2.882 0.013 0.993 38 10 8 33.018 19.964 3.018 12.305 0.002128 13 8 6 10.704 12.593 3.704 3.497 0.174 GSE16019, Chen G/G: 34/70 = 0.486 A/G: 29/70 = 0.414 A/A: 7/70 = 0.100 GSE13282, Gordan G/G: 7/15 = 0.467 A/G: 7/15 = 0.467 A/A: 1/15 = 0.067 GSE19189, Letouze G/G: 7/20 = 0.350 A/G: 10/20 = 0.500 A/A: 3/20 = 0.150 GSE10506, Nancarrow G/G: 14/22 = 0.636 A/G: 3/22 = 0.136 A/A: 5/22 = 0.227 GSE18799, Popova G/G: 6/17 = 0.353 A/G: 8/17 = 0.471 A/A: 3/17 = 0.176 GSE9003, Stark G/G: 38/56 = 0.679 A/G: 10/56 = 0.179 A/A: 8/56 = 0.143 GSE19177, Waddell G/G: 13/27 = 0.481 A/G: 8/27 = 0.296 A/A: 6/27 = 0.222 1 G and p-values are Williams’-corrected 224 6. rs10403583 (LGALS4) Genotype frequencies GSE21168, Castillo A/A: 0/3 = 0.000 A/G: 1/3 = 0.333 G/G: 2/3 = 0.667 Observed Expected G1 p-value1 0 1 2 0.083 0.833 2.083 0.165 0.921 3 25 51 3.041 24.918 51.041 0.000856 1 1 6 14 0.762 6.476 13.762 0.104 0.949 0 3 15 0.125 2.750 15.125 0.273 0.872 0 6 15 0.429 5.143 15.429 0.974 0.615 2 8 11 1.714 8.571 10.714 0.089 0.957 5 11 56 1.531 17.938 52.531 8.162 0.017 0 6 26 0.281 5.438 26.281 0.609 0.737 GSE16019, Chen A/A: 3/79 = 0.038 A/G: 25/79 = 0.316 G/G: 51/79 = 0.646 GSE13282, Gordan A/A: 1/21 = 0.048 A/G: 6/21 = 0.286 G/G: 14/21 = 0.667 GSE19189, Letouze A/A: 0/18 = 0.000 A/G: 3/18 = 0.167 G/G: 15/18 = 0.833 GSE10506, Nancarrow A/A: 0/21 = 0 A/G: 6/21 = 0.286 G/G: 15/21 = 0.714 GSE18799, Popova A/A: 2/21 = 0.095 A/G: 8/21 = 0.381 G/G: 11/21 = 0.524 GSE9003, Stark A/A: 5/72 = 0.069 A/G: 11/72 = 0.153 G/G: 56/72 = 0.778 GSE19177, Waddell A/A: 0/32 = 0.000 A/G: 6/32 = 0.188 G/G: 26/32 = 0.813 1 G and p-values are Williams’-corrected 225 7. rs10489789 (LGALS8) Genotype frequencies GSE16019, Chen G/G: 63/75 = 0.840 A/G: 10/75 = 0.133 A/A: 2/75 = 0.027 Observed Expected G1 p-value1 63 10 2 61.653 12.693 0.653 2.407 0.3 16 4 1 15.429 5.143 0.429 0.822 0.663 12 5 3 10.513 7.975 1.513 2.617 0.27 18 2 1 17.190 3.619 0.190 2.521 0.284 14 6 1 13.762 6.476 0.762 0.104 0.949 59 12 2 57.877 14.247 0.877 1.435 0.488 21 6 2 19.862 8.276 0.862 1.806 0.405 GSE13282, Gordan G/G: 16/21 = 0.762 G/A: 4/21 = 0.19 A/A: 1/21 = 0.048 GSE19189, Letouze G/G: 12/20 = 0.600 G/A: 5/20 = 0.250 A/A: 3/20 = 0.15 GSE10506, Nancarrow G/G: 18/21 = 0.857 G/A: 2/21 = 0.095 A/A: 1/21 = 0.048 GSE18799, Popova G/G: 14/21 = 0.667 G/A: 6/21 = 0.286 A/A: 1/21 = 0.048 GSE9003, Stark G/G: 59/73 = 0.808 G/A: 12/73 = 0.164 A/A: 2/73 = 0.027 GSE19177, Waddell G/G: 21/29 = 0.724 G/A: 6/29 = 0.207 A/A: 2/29 = 0.069 1 G and p-values are Williams’-corrected 226 LITERATURE CITED Ahmed H. (2010) Promoter Methylation in Prostate Cancer and its Application for the Early Detection of Prostate Cancer Using Serum and Urine Samples. Biomark Cancer. 2010(2):17-33. Ahmed H, Banerjee PP, Vasta GR. (2007) Differential expression of galectins in normal, benign and malignant prostate epithelial cells: silencing of galectin-3 expression in prostate cancer by its promoter methylation. Biochem Biophys Res Commun. 358(1):2416. Alves CM, Silva DA, Azzolini AE, Marzocchi-Machado CM, Carvalho JV, Pajuaba AC, Lucisano-Valim YM, Chammas R, Liu FT, Roque-Barreira MC, Mineo JR. (2010) Galectin-3 plays a modulatory role in the life span and activation of murine neutrophils during early Toxoplasma gondii infection. Immunobiology. 215(6):475-85. Ameur A, Rada-Iglesias A, Komorowski J, Wadelius C. (2009) Identification of candidate regulatory SNPs by combination of transcription-factor-binding site prediction, SNP genotyping and haploChIP. Nucleic Acids Res. 37(12):e85. Banerjee D, Nandagopal K. (2007) Potential interaction between the GARS-AIRS-GART Gene and CP2/LBP-1c/LSF transcription factor in Down syndrome-related Alzheimer disease. Cell Mol Neurobiol. 27(8):1117-26. Barondes SH, Castronovo V, Cooper DN, Cummings RD, Drickamer K, Feizi T, Gitt MA, Hirabayashi J, Hughes C, Kasai K, Hughes C, Kasai K, Leffler H, Liu FT, Lotan R, Mercurio AM, Monsigny M, Pillai S, Poirer F, Raz A, Rigby PWJ, Rini JM, Wang JL. (1994a) Galectins: a family of animal beta-galactoside-binding lectins. Cell. 76(4):597-8. Barondes SH, Cooper DN, Gitt MA, Leffler H. (1994b) Galectins. Structure and function of a large family of animal lectins. J Biol Chem. 269(33):20807-10. Barondes SH, Gitt MA, Leffler H, Cooper DN. (1988) Multiple soluble vertebrate galactoside-binding lectins. Biochimie. 70(11):1627-32. Beatty WL, Rhoades ER, Hsu DK, Liu FT, Russell DG. (2002) Association of a macrophage galactoside-binding protein with Mycobacterium-containing phagosomes. Cell Microbiol. 4(3):167-76. Bernardes ES, Silva NM, Ruas LP, Mineo JR, Loyola AM, Hsu DK, Liu FT, Chammas R, Roque-Barreira MC. (2006) Toxoplasma gondii infection reveals a novel regulatory role for galectin-3 in the interface of innate and adaptive immunity. Am J Pathol. 168(6):1910-20. 227 Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M. (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci U S A. 98(24):13790-5. Boer JM, Huber WK, Sültmann H, Wilmer F, von Heydebreck A, Haas S, Korn B, Gunawan B, Vente A, Füzesi L, Vingron M, Poustka A. (2001) Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31,500-element cDNA array. Genome Res. 11(11):1861-70. Boulianne RP, Liu Y, Aebi M, Lu BC, Kues U. (2000) Fruiting body development in Coprinus cinereus: regulated expression of two galectins secreted by a non-classical pathway. Microbiology. 146(Pt 8):1841-53. Braccia A, Villani M, Immerdal L, Niels-Christiansen LL, Nystrom BT, Hansen GH, Danielsen EM. (2003) Microvillar membrane microdomains exist at physiological temperature: role of galectin-4 as lipid raft stabilizer revealed by ‘superrafts’. J Biol Chem. 278:15679–15684. Bredel M, Bredel C, Juric D, Harsh GR, Vogel H, Recht LD, Sikic BI. (2005) Functional network analysis reveals extended gliomagenesis pathway maps and three novel MYCinteracting genes in human gliomas. Cancer Res. 65(19):8679-89. Brewer CF. (2002) Binding and cross-linking properties of galectins. Biochim Biophys Acta. 1572(2-3):255-262. Brewer CF, Miceli MC, Baum LG. (2002) Clusters, bundles, arrays and lattices: novel mechanisms for lectin-saccharide-mediated cellular interactions. Curr Opin Struct Biol. 12(5):616-23. Buchholz M, Braun M, Heidenblut A, Kestler HA, Klöppel G, Schmiegel W, Hahn SA, Lüttges J, Gress TM. (2005) Transcriptome analysis of microdissected pancreatic intraepithelial neoplastic lesions. Oncogene. 24(44):6626-36. Camby I, Le Mercier M, Lefranc F, Kiss R. (2006) Galectin-1: a small protein with major functions. Glycobiology. 16(11):137R-157R. Carmack CS, McCue LA, Newberg LA, Lawrence CE. (2007) PhyloScan: identification of transcription factor binding sites using cross-species evidence. Algorithms Mol Biol. 2:1. 228 Castillo SD, Angulo B, Suarez-Gauthier A, Melchor L, Medina PP, Sanchez-Verde L, Torres-Lanzas J, Pita G, Benitez J, Sanchez-Cespedes M. (2010) Gene amplification of the transcription factor DP1 and CTNND1 in human lung cancer. J Pathol. 222(1):89-98. Castronovo V, Van Den Brûle FA, Jackers P, Clausse N, Liu FT, Gillet C, Sobel ME. (1996) Decreased expression of galectin-3 is associated with progression of human breast cancer. J Pathol. 179(1):43-8. Cereghetti GM, Scorrano L. (2006) The many shapes of mitochondrial death. Oncogene. 25(34):4717-24. Cerhan JR, Liu-Mares W, Fredericksen ZS, Novak AJ, Cunningham JM, Kay NE, Dogan A, Liebow M, Wang AH, Call TG, Habermann TM, Ansell SM, Slager SL. (2008) Genetic variation in tumor necrosis factor and the nuclear factor-kappaB canonical pathway and risk of non-Hodgkin's lymphoma. Cancer Epidemiol Biomarkers Prev. 17(11):3161-9. Chen HY, Fermin A, Vardhana S, Weng IC, Lo KF, Chang EY, Maverakis E, Yang RY, Hsu DK, Dustin ML, Liu FT. (2009) Galectin-3 negatively regulates TCR-mediated CD4+ T-cell activation at the immunological synapse. Proc Natl Acad Sci USA. 106(34):14496-501. Chen HZ, Tsai SY, Leone G. (2009) Emerging roles of E2Fs in cancer: an exit from cell cycle control. Nat Rev Cancer. 9(11):785-97. Chen M, Ye Y, Yang H, Tamboli P, Matin S, Tannir NM, Wood CG, Gu J, Wu X. (2009) Genome-wide profiling of chromosomal alterations in renal cell carcinoma using highdensity single nucleotide polymorphism arrays. Int J Cancer. 125(10):2342-8. Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, Lai KM, Ji J, Dudoit S, Ng IO, Van De Rijn M, Botstein D, Brown PO. (2002) Gene expression patterns in human liver cancers. Mol Biol Cell. 13(6):1929-39. Chiariotti L, Salvatore P, Frunzio R, Bruni CB. (2004) Galectin genes: regulation of expression. Glycoconj J. 19(7-9):441-9. Chung CH, Parker JS, Karaca G, Wu J, Funkhouser WK, Moore D, Butterfoss D, Xiang D, Zanation A, Yin X, Shockley WW, Weissler MC, Dressler LG, Shores CG, Yarbrough WG, Perou CM. (2004) Molecular classification of head and neck squamous cell carcinomas using patterns of gene expression. Cancer Cell. 5(5):489-500. 229 Collins PM, Hidari KI, Blanchard H. (2007) Slow diffusion of lactose out of galectin-3 crystals monitored by X-ray crystallography: possible implications for ligand-exchange protocols. Acta Crystallogr D Biol Crystallogr. 63(Pt 3):415-9. Cordano P, Lake A, Shield L, Taylor GM, Alexander FE, Taylor PR, White J, Jarrett RF. (2005) Effect of IL-6 promoter polymorphism on incidence and outcome in Hodgkin's lymphoma. Br J Haematol. 128(4):493-5. Cortegano I, del Pozo V, Cárdaba B, de Andrés B, Gallardo S, del Amo A, Arrieta I, Jurado A, Palomino P, Liu FT, Lahoz C. (1998) Galectin-3 down-regulates IL-5 gene expression on different cell types. J Immunol. 161(1):385-9. Cromer A, Carles A, Millon R, Ganguli G, Chalmel F, Lemaire F, Young J, Dembélé D, Thibault C, Muller D, Poch O, Abecassis J, Wasylyk B. (2004) Identification of genes associated with tumorigenesis and metastatic potential of hypopharyngeal cancer by microarray analysis. Oncogene. 23(14):2484-98. Daly MJ. (2009) Assessing significance in genetic association studies. Cold Spring Harb Protoc. 2009(8):pdb.top58. Danguy A, Camby I, Kiss R. (2002) Galectins and cancer. Biochim Biophys Acta. 1572(2-3):285-93. Delacour D, Cramm-Behrens CI, Drobecq H, Le Bivic A, Naim HY, Jacob R. (2006) Requirement for galectin-3 in apical protein sorting. Curr Biol. 16(4):408-14. Demers M, Couillard J, Giglia-Mari G, Magnaldo T, St-Pierre Y. (2009) Increased galectin-7 gene expression in lymphoma cells is under the control of DNA methylation. Biochem Biophys Res Commun. 387(3):425-9. Demetriou M, Granovsky M, Quaggin S, Dennis JW. (2001) Negative regulation of Tcell activation and autoimmunity by Mgat5 N-glycosylation. Nature. 409(6821):733-9. Dhanasekaran SM, Barrette TR, Ghosh D, Shah R, Varambally S, Kurachi K, Pienta KJ, Rubin MA, Chinnaiyan AM. (2001) Delineation of prognostic biomarkers in prostate cancer. Nature. 412(6849):822-6. Dhirapong A, Lleo A, Leung P, Gershwin ME, Liu FT. (2009) The immunological potential of galectin-1 and -3. Autoimmun Rev. 8(5):360-3. Dimmer KS, Scorrano L. (2006) (De)constructing mitochondria: what for? Physiology. 21:233-41. 230 Draheim KM, Chen HB, Tao Q, Moore N, Roche M, Lyle S. (2010) ARRDC3 suppresses breast cancer progression by negatively regulating integrin beta4. Oncogene. 29(36):5032-47. Duhagon MA, Hurt EM, Sotelo-Silveira JR, Zhang X, Farrar WL. (2010) Genomic profiling of tumor initiating prostatospheres. BMC Genomics. 11:324. Duneau M, Boyer-Guittaut M, Gonzalez P, Charpentier S, Normand T, Dubois M, Raimond J, Legrand A. (2005) Galig, a novel cell death gene that encodes a mitochondrial protein promoting cytochrome c release. Exp Cell Res. 302(2):194-205. Dyrskjøt L, Kruhøffer M, Thykjaer T, Marcussen N, Jensen JL, Møller K, Ørntoft TF. (2004) Gene expression in the urinary bladder: a common carcinoma in situ gene expression signature exists disregarding histopathological classification. Cancer Res. 64(11):4040-8. Enjuanes A, Benavente Y, Bosch F, Martín-Guerrero I, Colomer D, Pérez-Alvarez S, Reina O, Ardanaz MT, Jares P, García-Orad A, Pujana MA, Montserrat E, de Sanjosé S, Campo E. (2008) Genetic variants in apoptosis and immunoregulation-related genes are associated with risk of chronic lymphocytic leukemia. Cancer Res. 68(24):10178-86. Fogel S, Guittaut M, Legrand A, Monsigny M, Hébert E. (1999) The tat protein of HIV-1 induces galectin-3 expression. Glycobiology. 9(4):383-7. French PJ, Swagemakers SM, Nagel JH, Kouwenhoven MC, Brouwer E, van der Spek P, Luider TM, Kros JM, van den Bent MJ, Sillevis Smitt PA. (2005) Gene expression profiles associated with treatment response in oligodendrogliomas. Cancer Res. 65(24):11335-44. Frierson HF Jr, El-Naggar AK, Welsh JB, Sapinoso LM, Su AI, Cheng J, Saku T, Moskaluk CA, Hampton GM. (2002) Large scale molecular analysis identifies genes with altered expression in salivary adenoid cystic carcinoma. Am J Pathol. 161(4):1315-23. Fukumori T, Oka N, Takenaka Y, Nangia-Makker P, Elsamman E, Kasai T, Shono M, Kanayama HO, Ellerhorst J, Lotan R, Raz A. (2006) Galectin-3 regulates mitochondrial stability and antiapoptotic function in response to anticancer drug in prostate cancer. Cancer Res. 66(6):3114-9. Fukumori T, Takenaka Y, Oka N, Yoshii T, Hogan V, Inohara H, Kanayama HO, Kim HR, Raz A. (2004) Endogenous galectin-3 determines the routing of CD95 apoptotic signaling pathways. Cancer Res. 64(10):3376-9. 231 Fukumori T, Takenaka Y, Yoshii T, Kim HR, Hogan V, Inohara H, Kagawa S, Raz A. (2003) CD29 and CD7 mediate galectin-3-induced type II T-cell apoptosis. Cancer Res. 63(23):8302-11. Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, van de Rijn M, Rosen GD, Perou CM, Whyte RI, Altman RB, Brown PO, Botstein D, Petersen I. (2001) Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci U S A. 98(24):13784-9. Garner OB, Baum LG. (2008) Galectin-glycan lattices regulate cell-surface glycoprotein organization and signalling. Biochem Soc Trans. 36(Pt 6):1472-7. Gauthier L, Rossi B, Roux F, Termine E, Schiff C. (2002) Galectin-1 is a stromal cell ligand of the pre-B cell receptor (BCR) implicated in synapse formation between pre-B and stromal cells and in pre-BCR triggering. Proc Natl Acad Sci U S A. 99(20):13014-9. Gauthier S, Pelletier I, Ouellet M, Vargas A, Tremblay MJ, Sato S, Barbeau B. (2008) Induction of galectin-1 expression by HTLV-I Tax and its impact on HTLV-I infectivity. Retrovirology. 5:105. Ginos MA, Page GP, Michalowicz BS, Patel KJ, Volker SE, Pambuccian SE, Ondrey FG, Adams GL, Gaffney PM. (2004) Identification of a gene expression signature associated with recurrent disease in squamous cell carcinoma of the head and neck. Cancer Res. 64(1):55-63. Gong HC, Honjo Y, Nangia-Makker P, Hogan V, Mazurak N, Bresalier RS, Raz A. (1999) The NH2 terminus of galectin-3 governs cellular compartmentalization and functions in cancer cells. Cancer Res. 59(24):6239-45. Gordan JD, Lal P, Dondeti VR, Letrero R, Parekh KN, Oquendo CE, Greenberg RA, Flaherty KT, Rathmell WK, Keith B, Simon MC, Nathanson KL. (2008) HIF-alpha effects on c-Myc distinguish two subtypes of sporadic VHL-deficient clear cell renal carcinoma. Cancer Cell. 14(6):435-46. Gordon GJ, Rockwell GN, Jensen RV, Rheinwald JG, Glickman JN, Aronson JP, Pottorf BJ, Nitz MD, Richards WG, Sugarbaker DJ, Bueno R. (2005) Identification of novel candidate oncogenes and tumor suppressors in malignant pleural mesothelioma using large-scale transcriptional profiling. Am J Pathol. 166(6):1827-40. Graudens E, Boulanger V, Mollard C, Mariage-Samson R, Barlet X, Grémy G, Couillault C, Lajémi M, Piatier-Tonneau D, Zaborski P, Eveno E, Auffray C, Imbeaud S. (2006) Deciphering cellular states of innate tumor drug responses. Genome Biol. 7(3):R19. 232 Gu M, Wang W, Song WK, Cooper DN, Kaufman SJ. (1994) Selective modulation of the interaction of alpha 7 beta 1 integrin with fibronectin and laminin by L-14 lectin during skeletal muscle differentiation. J Cell Sci. 107:175-81. Guévremont M, Martel-Pelletier J, Boileau C, Liu FT, Richard M, Fernandes JC, Pelletier JP, Reboul P. (2004) Galectin-3 surface expression on human adult chondrocytes: a potential substrate for collagenase-3. Ann Rheum Dis. 63(6):636-43. Guittaut M, Charpentier S, Normand T, Dubois M, Raimond J, Legrand A. (2001) Identification of an internal gene to the human Galectin-3 gene with two different overlapping reading frames that do not encode Galectin-3. J Biol Chem. 276(4):2652-7. Gyorffy B, Kocsis I, Vasarhelyi B. (2003) Biallelic genotype distributions in papers published in Gut between 1998 and 2003: altered conclusions after recalculating the Hardy-Weinberg equilibrium. Gut. 53:614–5. Hadari YR, Arbel-Goren R, Levy Y, Amsterdam A, Alon R, Zakut R, Zick Y. (2000) Galectin-8 binding to integrins inhibits cell adhesion and induces apoptosis. J Cell Sci. 113:2385-97. Han S, Park K, Bae BN, Kim KH, Kim HJ, Kim YD, Kim HY. (2003) E2F1 expression is related with the poor survival of lymph node-positive breast cancer patients treated with fluorouracil, doxorubicin and cyclophosphamide. Breast Cancer Res Treat. 82(1):11-6. Hannenhalli S. (2008) Eukaryotic transcription factor binding sites--modeling and integrative search methods. Bioinformatics. 24(11):1325-31. Haqq C, Nosrati M, Sudilovsky D, Crothers J, Khodabakhsh D, Pulliam BL, Federman S, Miller JR 3rd, Allen RE, Singer MI, Leong SP, Ljung BM, Sagebiel RW, Kashani-Sabet M. (2005) The gene expression signatures of melanoma progression. Proc Natl Acad Sci USA. 102(17):6092-7. Hartl DL, Jones EW. (2009) Genetics: Analysis of genes and genomes (7th ed.). Jones & Bartlett, Sudbury, Massachusetts. He J, Baum LG. (2006) Galectin interactions with extracellular matrix and effects on cellular function. Methods Enzymol. 417:247-56. He M, Gitschier J, Zerjal T, de Knijff P, Tyler-Smith C, Xue Y. (2009) Geographical affinities of the HapMap samples. PLoS One. 4(3):e4684. 233 Hendrix ND, Wu R, Kuick R, Schwartz DR, Fearon ER, Cho KR. (2006) Fibroblast growth factor 9 has oncogenic activity and is a downstream target of Wnt signaling in ovarian endometrioid adenocarcinomas. Cancer Res. 66(3):1354-62. Hernandez JD, Baum LG. (2002) Ah, sweet mystery of death! Galectins and control of cell fate. Glycobiology. 12(10):127R-36R. Hoek KS, Schlegel NC, Brafford P, Sucker A, Ugurel S, Kumar R, Weber BL, Nathanson KL, Phillips DJ, Herlyn M, Schadendorf D, Dummer R. (2006) Metastatic potential of melanomas defined by specific gene expression profiles with no BRAF signature. Pigment Cell Res. 19(4):290-302. Houzelstein D, Gonçalves IR, Orth A, Bonhomme F, Netter P. (2008) Lgals6, a 2million-year-old gene in mice: a case of positive Darwinian selection and presence/absence polymorphism. Genetics. 178(3):1533-45. Hsu DK, Chen HY, Liu FT. (2009a) Galectin-3 regulates T-cell functions. Immunol Rev. 230(1):114-27. Hsu DK, Chernyavsky AI, Chen HY, Yu L, Grando SA, Liu FT. (2009b) Endogenous galectin-3 is localized in membrane lipid rafts and regulates migration of dendritic cells. J Invest Dermatol. 129(3):573-83. Hsu DK, Hammes SR, Kuwabara I, Greene WC, Liu FT. (1996) Human T lymphotropic virus-I infection of human T lymphocytes induces expression of the beta-galactosidebinding lectin, galectin-3. Am J Pathol. 148(5):1661-70. Hsu DK, Liu FT. (2004) Regulation of cellular homeostasis by galectins. Glycoconj J. 19(7-9):507-15. Hsu DK, Yang RY, Liu FT. (2006) Galectins in apoptosis. Methods Enzymol. 417:25673. Hsu DK, Zuberi RI, Liu FT. (1992) Biochemical and biophysical characterization of human recombinant IgE-binding protein, an S-type animal lectin. J Biol Chem. 267(20):14167-74. Huang Y, Prasad M, Lemon WJ, Hampel H, Wright FA, Kornacker K, LiVolsi V, Frankel W, Kloos RT, Eng C, Pellegata NS, de la Chapelle A. (2001) Gene expression in papillary thyroid carcinoma reveals highly consistent profiles. Proc Natl Acad Sci U S A. 98(26):15044-9. 234 Iacobuzio-Donahue CA, Maitra A, Olsen M, Lowe AW, van Heek NT, Rosty C, Walter K, Sato N, Parker A, Ashfaq R, Jaffee E, Ryu B, Jones J, Eshleman JR, Yeo CJ, Cameron JL, Kern SE, Hruban RH, Brown PO, Goggins M. (2003) Exploration of global gene expression patterns in pancreatic adenocarcinoma using cDNA microarrays. Am J Pathol. 162(4):1151-62. Ilarregui JM, Bianco GA, Toscano MA, Rabinovich GA. (2005) The coming of age of galectins as immunomodulatory agents: impact of these carbohydrate binding proteins in T cell physiology and chronic inflammatory disorders. Ann Rheum Dis. 64 Suppl 4:iv96103. International HapMap Consortium. (2003) The International HapMap Project. Nature. 426:789-796. International HapMap Consortium. (2005) A haplotype map of the human genome. Nature. 437:1299–1230. Irie A, Yamauchi A, Kontani K, Kihara M, Liu D, Shirato Y, Seki M, Nishi N, Nakamura T, Yokomise H, Hirashima M. (2005) Galectin-9 as a prognostic factor with antimetastatic potential in breast cancer. Clin Cancer Res. 11(8):2962-8. Ishikawa M, Yoshida K, Yamashita Y, Ota J, Takada S, Kisanuki H, Koinuma K, Choi YL, Kaneda R, Iwao T, Tamada K, Sugano K, Mano H. (2005) Experimental trial for diagnosis of pancreatic ductal carcinoma based on gene expression profiles of pancreatic ductal cells. Cancer Sci. 96(7):387-93. Kadrofske MM, Openo KP, Wang JL. (1998) The human LGALS3 (galectin-3) gene: determination of the gene structure and functional characterization of the promoter. Arch Biochem Biophys. 349(1):7-20. Kageshita T, Kashio Y, Yamauchi A, Seki M, Abedin MJ, Nishi N, Shoji H, Nakamura T, Ono T, Hirashima M. (2002) Possible role of galectin-9 in cell aggregation and apoptosis of human melanoma cell lines and its clinical significance. Int J Cancer. 99(6):809-16. Kasamatsu A, Uzawa K, Nakashima D, Koike H, Shiiba M, Bukawa H, Yokoe H, Tanzawa H. (2005) Galectin-9 as a regulator of cellular adhesion in human oral squamous cell carcinoma cell lines. Int J Mol Med. 16(2):269-73. Kasowski M, Grubert F, Heffelfinger C, Hariharan M, Asabere A, Waszak SM, Habegger L, Rozowsky J, Shi M, Urban AE, Hong MY, Karczewski KJ, Huber W, Weissman SM, Gerstein MB, Korbel JO, Snyder M. (2010) Variation in transcription factor binding among humans. Science. 328(5975):232-5. 235 King RD, Lubinski JM, Friedman HM. (2009) Herpes simplex virus type 1 infection increases the carbohydrate binding activity and the secretion of cellular galectin-3. Arch Virol. 154(4):609-18. Kobayashi T, Kuroda J, Ashihara E, Oomizu S, Terui Y, Taniyama A, Adachi S, Takagi T, Yamamoto M, Sasaki N, Horiike S, Hatake K, Yamauchi A, Hirashima M, Taniwaki M. (2010) Galectin-9 exhibits anti-myeloma activity through JNK and p38 MAP kinase pathways. Leukemia. 24(4):843-50. Korkola JE, Houldsworth J, Chadalavada RS, Olshen AB, Dobrzynski D, Reuter VE, Bosl GJ, Chaganti RS. (2006) Down-regulation of stem cell genes, including those in a 200-kb gene cluster at 12p13.31, is associated with in vivo differentiation of human male germ cell tumors. Cancer Res. 66(2):820-7. Kuwabara I, Sano H, Liu FT. (2003) Functions of galectins in cell adhesion and chemotaxis. Methods Enzymol. 363:532-52. Laderach DJ, Compagno D, Toscano MA, Croci DO, Dergan-Dylon S, Salatino M, Rabinovich GA. (2010) Dissecting the signal transduction pathways triggered by galectin-glycan interactions in physiological and pathological settings. IUBMB Life. 62(1):1-13. LaFramboise T, Dewal N, Wilkins K, Pe'er I, Freedman ML. (2010) Allelic selection of amplicons in glioblastoma revealed by combining somatic and germline analysis. PLoS Genet. 6(9):e1001086. Lagana A, Goetz JG, Cheung P, Raz A, Dennis JW, Nabi IR. (2006) Galectin binding to Mgat5-modified N-glycans regulates fibronectin matrix remodeling in tumor cells. Mol Cell Biol. 26(8):3181-93. Lahm H, André S, Hoeflich A, Kaltner H, Siebert HC, Sordat B, von der Lieth CW, Wolf E, Gabius HJ. (2004) Tumor galectinology: insights into the complex network of a family of endogenous lectins. Glycoconj J. 20(4):227-38. Lancaster JM, Dressman HK, Whitaker RS, Havrilesky L, Gray J, Marks JR, Nevins JR, Berchuck A. (2004) Gene expression patterns that characterize advanced stage serous ovarian cancers. J Soc Gynecol Investig. 11(1):51-9. Lapointe J, Li C, Higgins JP, van de Rijn M, Bair E, Montgomery K, Ferrari M, Egevad L, Rayford W, Bergerheim U, Ekman P, DeMarzo AM, Tibshirani R, Botstein D, Brown PO, Brooks JD, Pollack JR. (2004) Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci U S A. 101(3):811-6. 236 Lenburg ME, Liou LS, Gerry NP, Frampton GM, Cohen HT, Christman MF. (2003) Previously unidentified changes in renal cell carcinoma gene expression identified by parametric analysis of microarray data. BMC Cancer. 3:31. Leonidas DD, Elbert BL, Zhou Z, Leffler H, Ackerman SJ, Acharya KR. (1995) Crystal structure of human Charcot-Leyden crystal protein, an eosinophil lysophospholipase, identifies it as a new member of the carbohydrate-binding family of galectins. Structure. 3(12):1379-93. Letouze E, Allory Y, Bollet MA, Radvanyi F, Guyon F. (2010) Analysis of the copy number profiles of several tumor samples from the same patient reveals the successive steps in tumorigenesis. Genome Biol. 11(7):R76. Li M, Zhao Y, Li Y, Li C, Chen F, Mao J, Zhang Y. (2009) Upregulation of human withno-lysine kinase-4 gene expression by GATA-1 acetylation. Int J Biochem Cell Biol. 41(4):872-8. Li ZX, Ma X, Wang ZH. (2006) A differentially methylated region of the DAZ1 gene in spermatic and somatic cells. Asian J Androl. 8(1):61-7. Liang Y, Diehn M, Watson N, Bollen AW, Aldape KD, Nicholas MK, Lamborn KR, Berger MS, Botstein D, Brown PO, Israel MA. (2005) Gene expression profiling reveals molecularly and clinically distinct subtypes of glioblastoma multiforme. Proc Natl Acad Sci U S A. 102(16):5814-9. Liu FT. (2000) Galectins: a new family of regulators of inflammation. Clin Immunol. 97(2):79-88. Liu FT. (2005) Regulatory roles of galectins in the immune response. Int Arch Allergy Immunol. 136:385-400. Liu FT, Hsu DK, Zuberi RI, Hill PN, Shenhav A, Kuwabara I, Chen SS. (1996) Modulation of functional properties of galectin-3 by monoclonal antibodies binding to the non-lectin domains. Biochemistry. 35(19):6073-9. Liu FT, Patterson RJ, Wang JL. (2002) Intracellular functions of galectins. Biochim. Biophys Acta. 1572(2-3):263-73. Liu FT, Rabinovich GA. (2005) Galectins as modulators of tumour progression. Nat Rev Cancer. 5(1):29-41. 237 Liu SD, Whiting CC, Tomassian T, Pang M, Bissel SJ, Baum LG, Mossine VV, Poirier F, Huflejt ME, Miceli MC. (2008) Endogenous galectin-1 enforces class I-restricted TCR functional fate decisions in thymocytes. Blood. 112(1):120-30. Logsdon CD, Simeone DM, Binkley C, Arumugam T, Greenson JK, Giordano TJ, Misek DE, Kuick R, Hanash S. (2003) Molecular profiling of pancreatic adenocarcinoma and chronic pancreatitis identifies multiple genes differentially regulated in pancreatic cancer. Cancer Res. 63(10):2649-57. Luo J, Duggan DJ, Chen Y, Sauvageot J, Ewing CM, Bittner ML, Trent JM, Isaacs WB. (2001) Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. Cancer Res. 61(12):4683-8. Luo JH, Yu YP, Cieply K, Lin F, Deflavia P, Dhir R, Finkelstein S, Michalopoulos G, Becich M. (2002) Gene expression analysis of prostate cancers. Mol Carcinog. 33(1):2535. Mangino M, Braund P, Singh R, Steeds R, Thompson JR, Channer K, Samani NJ. (2007) LGALS2 functional variant rs7291467 is not associated with susceptibility to myocardial infarction in Caucasians. Atherosclerosis. 194(1):112-5. Mann PS. (2004) Introductory Statistics (5th ed.). John Wiley & Sons Inc., Hoboken, New Jersey. Manolio T. (2010) Genomewide Association Studies and Assessment of the Risk of Disease. N Engl J Med. 363(2):166-76. Massa SM, Cooper DN, Leffler H, Barondes SH. (1993) L-29, an endogenous lectin, binds to glycoconjugate ligands with positive cooperativity. Biochemistry. 32(1):260-7. Matarrese P, Fusco O, Tinari N, Natoli C, Liu FT, Semeraro ML, Malorni W, Iacobelli S. (2000) Galectin-3 overexpression protects from apoptosis by improving cell adhesion properties. Int J Cancer. 85(4):545-54. Matsumoto R, Matsumoto H, Seki M, Hata M, Asano Y, Kanegasaki S, Stevens RL, Hirashima M. (1998) Human ecalectin, a variant of human galectin-9, is a novel eosinophil chemoattractant produced by T lymphocytes. J Biol Chem. 273(27):16976-84. Mazurek N, Conklin J, Byrd JC, Raz A, Bresalier RS. (2000) Phosphorylation of the beta-galactoside-binding protein galectin-3 modulates binding to its ligands. J Biol Chem. 275(46):36311-5. 238 McDonald JH. (2009) Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland. Motran CC, Molinder KM, Liu SD, Poirier F, Miceli MC. (2008) Galectin-1 functions as a Th2 cytokine that selectively induces Th1 apoptosis and promotes Th2 function. Eur J Immunol. 38(11):3015-27. Mutter GL, Baak JP, Fitzgerald JT, Gray R, Neuberg D, Kust GA, Gentleman R, Gullans SR, Wei LJ, Wilcox M. (2001) Global expression changes of constitutive and hormonally regulated genes during endometrial neoplastic transformation. Gynecol Oncol. 83(2):17785. Nakahara S, Oka N, Raz A. (2005) On the role of galectin-3 in cancer apoptosis. Apoptosis. 10(2):267-75. Nakahara S, Oka N, Wang Y, Hogan V, Inohara H, Raz A. (2006) Characterization of the nuclear import pathways of galectin-3. Cancer Res. 66(20):9995-10006. Nakahara S, Raz A. (2006) On the role of galectins in signal transduction. Methods Enzymol. 417:273-89. Nancarrow DJ, Handoko HY, Smithers M, Gotley DC, Drew PA, Watson DI, Clouston AD, Hayward NK, Whiteman DC. (2008) Genome-wide copy number analysis in esophageal adenocarcinoma using high-density single-nucleotide polymorphism arrays. Cancer Res. 68(11):4163-72. Nebert DW, Zhang G, Vesell ES. (2008) From human genetics and genomics to pharmacogenetics and pharmacogenomics: past lessons, future directions. Drug Metab Rev. 40(2):187-224. Nguyen JT, Evans DP, Galvan M, Pace KE, Leitenberg D, Bui TN, Baum LG. (2001) CD45 modulates galectin-1-induced T cell death: regulation by expression of core 2 Oglycans. J Immunol. 167(10):5697-707. Nieminen J, St-Pierre C, Sato S. (2005) Galectin-3 interacts with naive and primed neutrophils, inducing innate immune responses. J Leukoc Biol. 78(5):1127-35. Nobumoto A, Nagahara K, Oomizu S, Katoh S, Nishi N, Takeshita K, Niki T, Tominaga A, Yamauchi A, Hirashima M. (2008) Galectin-9 suppresses tumor metastasis by blocking adhesion to endothelium and extracellular matrices. Glycobiology. 18(9):73544. 239 Novak AJ, Slager SL, Fredericksen ZS, Wang AH, Manske MM, Ziesmer S, Liebow M, Macon WR, Dillon SR, Witzig TE, Cerhan JR, Ansell SM. (2009) Genetic variation in Bcell-activating factor is associated with an increased risk of developing B-cell nonHodgkin lymphoma. Cancer Res. 69(10):4217-24. Ochieng J, Furtak V, Lukyanov P. (2004) Extracellular functions of galectin-3. Glycoconj J. 19(7-9):527-35. Oka N, Nakahara S, Takenaka Y, Fukumori T, Hogan V, Kanayama HO, Yanagawa T, Raz A. (2005) Galectin-3 inhibits tumor necrosis factor-related apoptosis-inducing ligand-induced apoptosis by activating Akt in human bladder carcinoma cells. Cancer Res. 65(17):7546-53. Okumura CY, Baum LG, Johnson PJ. (2008) Galectin-1 on cervical epithelial cells is a receptor for the sexually transmitted human parasite Trichomonas vaginalis. Cell Microbiol. 10(10):2078-90. Ozaki K, Inoue K, Sato H, Iida A, Ohnishi Y, Sekine A, Sato H, Odashiro K, Nobuyoshi M, Hori M, Nakamura Y, Tanaka T. (2004) Functional variation in LGALS2 confers risk of myocardial infarction and regulates lymphotoxin-alpha secretion in vitro. Nature. 429(6987):72-5. Park JW, Voss PG, Grabski S, Wang JL, Patterson RJ. (2001) Association of galectin-1 and galectin-3 with Gemin4 in complexes containing the SMN protein. Nucleic Acids Res. 29(17):3595–602. Peng W, Wang HY, Miyahara Y, Peng G, Wang RF. (2008) Tumor-associated galectin-3 modulates the function of tumor-reactive T cells. Cancer Res. 68(17):7228-36. Perillo NL, Pace KE, Seilhamer JJ, Baum LG. (1995) Apoptosis of T cells mediated by galectin-1. Nature. 378(6558):736-9. Plzak J, Betka J, Smetana K, Chovanec M, Kaltner H, Andre S, Kodet R, Gabius, HJ. (2004) Galectin-3 - an emerging prognostic indicator in advanced head and neck carcinoma. Eur J Cancer. 40(15):2324-30. Popova T, Manié E, Stoppa-Lyonnet D, Rigaill G, Barillot E, Stern MH. (2009) Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays. Genome Biol. 10(11):R128. Prieto VG, Mourad-Zeidan AA, Melnikova V, Johnson MM, Lopez A, Diwan AH, Lazar AJ, Shen SS, Zhang PS, Reed JA, Gershenwald JE, Raz A, Bar-Eli M. (2006) Galectin-3 240 expression is associated with tumor progression and pattern of sun exposure in melanoma. Clin Cancer Res. 12(22):6709-15. Quade BJ, Wang TY, Sornberger K, Dal Cin P, Mutter GL, Morton CC. (2004) Molecular pathogenesis of uterine smooth muscle tumors from transcriptional profiling. Genes Chromosomes Cancer. 40(2):97-108. Rabinovich GA, Ariel A, Hershkoviz R, Hirabayashi J, Kasai KI, Lider O. (1999a) Specific inhibition of T-cell adhesion to extracellular matrix and proinflammatory cytokine secretion by human recombinant galectin-1. Immunology. 97(1):100-6. Rabinovich GA, Liu FT, Hirashima M, Anderson A. (2007) An emerging role for galectins in tuning the immune response: Lessons from experimental models of inflammatory diseases, autoimmunity and cancer. Scandinavian Journal of Immunology. 66:143-58. Rabinovich GA, Riera CM, Landa CA, Sotomayor CE. (1999b) Galectins: a key intersection between glycobiology and immunology. Braz J Med Biol Res. 32(4):383-93. Rabinovich GA, Rubinstein N, Fainboim L. (2002) Unlocking the secrets of galectins: a challenge of glyco-immunology. J. Leukoc. Biol. 71:741-52. Rabinovich GA, Toscano MA. (2009) Turning 'sweet' on immunity: galectin-glycan interactions in immune tolerance and inflammation. Nat Rev Immunol. 9(5):338-52. Rabinovich GA, Toscano MA, Ilarregui JM, Rubinstein N. (2004) Shedding light on the immunomodulatory properties of galectins: novel regulators of innate and adaptive immune responses. Glycoconj J. 19(7-9):565-73. Raimond J, Rouleux F, Monsigny M, Legrand A. (1995) The second intron of the human galectin-3 gene has a strong promoter activity down-regulated by p53. FEBS Lett. 363(12):165-9. Rakha EA, Pinder SE, Paish EC, Robertson JF, Ellis IO. (2004) Expression of E2F-4 in invasive breast carcinomas is associated with poor prognosis. J Pathol. 203(3):754-61. Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Varambally R, Yu J, Briggs BB, Barrette TR, Anstet MJ, Kincead-Beal C, Kulkarni P, Varambally S, Ghosh D, Chinnaiyan AM. (2007) Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia. 9:166-180. 241 Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM. (2004) ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia. 6:1-6. Richardson AL, Wang ZC, De Nicolo A, Lu X, Brown M, Miron A, Liao X, Iglehart JD, Livingston DM, Ganesan S. (2006) X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell. 9(2):121-32. Rickman DS, Bobek MP, Misek DE, Kuick R, Blaivas M, Kurnit DM, Taylor J, Hanash SM. (2001) Distinctive molecular profiles of high-grade and low-grade gliomas based on oligonucleotide microarray analysis. Cancer Res. 61(18):6885-91. Ruebel KH, Jin L, Qian X, Scheithauer BW, Kovacs K, Nakamura N, Zhang H, Raz A, Lloyd RV. (2005) Effects of DNA methylation on galectin-3 expression in pituitary tumors. Cancer Res. 65(4):1136-40. Salatino M, Rabinovich GA. (2011) Fine-tuning antitumor responses through the control of galectin-glycan interactions: an overview. Methods Mol Biol. 677:355-74. Sanchez-Carbayo M, Socci ND, Lozano J, Saint F, Cordon-Cardo C. (2006) Defining molecular profiles of poor outcome in patients with invasive bladder cancer using oligonucleotide microarrays. J Clin Oncol. 24(5):778-89. Sano H, Hsu DK, Apgar JR, Yu L, Sharma BB, Kuwabara I, Izui S, Liu FT. (2003) Critical role of galectin-3 in phagocytosis by macrophages. J Clin Invest. 112(3):389-97. Sano H, Hsu DK, Yu L, Apgar JR, Kuwabara I, Yamanaka T, Hirashima M, Liu FT. (2000) Human galectin-3 is a novel chemoattractant for monocytes and macrophages. J Immunol. 165(4):2156-64. Santucci L, Fiorucci S, Rubinstein N, Mencarelli A, Palazzetti B, Federici B, Rabinovich GA, Morelli A. (2003) Galectin-1 suppresses experimental colitis in mice. Gastroenterology. 124(5):1381-94. Sato S, Hughes RC. (1992) Binding specificity of a baby hamster kidney lectin for H type I and II chains, polylactosamine glycans, and appropriately glycosylated forms of laminin and fibronectin. J Biol Chem. 267(10):6983-90. Sato S, Ouellet N, Pelletier I, Simard M, Rancourt A, Bergeron MG. (2002) Role of galectin-3 as an adhesion molecule for neutrophil extravasation during streptococcal pneumonia. J Immunol. 168(4):1813-22. Saussez S, Kiss R. (2006) Galectin-7. Cell Mol Life Sci. 63(6):686-97. 242 Saxonov S, Berg P, Brutlag DL. (2006) A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci U S A. 103(5):1412-7. Sedlacek K, Neureuther K, Mueller JC, Stark K, Fischer M, Baessler A, Reinhard W, Broeckel U, Lieb W, Erdmann J, Schunkert H, Riegger G, Illig T, Meitinger T, Hengstenberg C. (2007) Lymphotoxin-alpha and galectin-2 SNPs are not associated with myocardial infarction in two different German populations. J Mol Med. 85(9):997-1004. Shai R, Shi T, Kremen TJ, Horvath S, Liau LM, Cloughesy TF, Mischel PS, Nelson SF. (2003) Gene expression profiling identifies molecular subtypes of gliomas. Oncogene. 22(31):4918-23. Skotheim RI, Lind GE, Monni O, Nesland JM, Abeler VM, Fosså SD, Duale N, Brunborg G, Kallioniemi O, Andrews PW, Lothe RA. (2005) Differentiation of human embryonal carcinomas in vitro and in vivo reveals expression profiles relevant to normal development. Cancer Res. 65(13):5588-98. Sperger JM, Chen X, Draper JS, Antosiewicz JE, Chon CH, Jones SB, Brooks JD, Andrews PW, Brown PO, Thomson JA. (2003) Gene expression patterns in human embryonic stem cells and human pluripotent germ cell tumors. Proc Natl Acad Sci U S A. 100(23):13350-5. Staaf J, Lindgren D, Vallon-Christersson J, Isaksson A, Göransson H, Juliusson G, Rosenquist R, Höglund M, Borg A, Ringner M. (2008) Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol. 9(9):R136. Stark M, Hayward N. (2007) Genome-wide loss of heterozygosity and copy number analysis in melanoma using high-density single-nucleotide polymorphism arrays. Cancer Res. 67(6):2632-42. Stegmaier K, Ross KN, Colavito SA, O'Malley S, Stockwell BR, Golub TR. (2004) Gene expression-based high-throughput screening (GE-HTS) and application to leukemia differentiation. Nat Genet. 36:257-63. Stillman BN, Hsu DK, Pang M, Brewer CF, Johnson P, Liu FT, Baum LG. (2006) Galectin-3 and galectin-1 bind distinct cell surface glycoprotein receptors to induce T cell death. J Immunol. 176(2):778-89. 243 Sturm A, Lensch M, André S, Kaltner H, Wiedenmann B, Rosewicz S, Dignass AU, Gabius HJ. (2004) Human galectin-2: novel inducer of T cell apoptosis with distinct profile of caspase activation. J Immunol. 173(6):3825-37. Sun L, Hui AM, Su Q, Vortmeyer A, Kotliarov Y, Pastorino S, Passaniti A, Menon J, Walling J, Bailey R, Rosenblum M, Mikkelsen T, Fine HA. (2006) Neuronal and gliomaderived stem cell factor induces angiogenesis within the brain. Cancer Cell. 9(4):287300. Takai D, Jones PA. (2002) Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci USA 99(6):3740-5. Talantov D, Mazumder A, Yu JX, Briggs T, Jiang Y, Backus J, Atkins D, Wang Y. (2005) Novel genes associated with malignant melanoma but not benign melanocytic lesions. Clin Cancer Res. 11(20):7234-42. Talbot SG, Estilo C, Maghami E, Sarkaria IS, Pham DK, O-charoenrat P, Socci ND, Ngai I, Carlson D, Ghossein R, Viale A, Park BJ, Rusch VW, Singh B. (2005) Gene expression profiling allows distinction between primary and metastatic squamous cell carcinomas in the lung. Cancer Res. 65(8):3063-71. Talseth BA, Meldrum C, Suchy J, Kurzawski G, Lubinski J, Scott RJ. (2006) Genetic polymorphisms in xenobiotic clearance genes and their influence on disease expression in hereditary nonpolyposis colorectal cancer patients. Cancer Epidemiol Biomarkers Prev. 15(11):2307-10. Tang JZ, Kong XJ, Kang J, Fielder GC, Steiner M, Perry JK, Wu ZS, Yin Z, Zhu T, Liu DX, Lobie PE. (2010) Artemin-stimulated progression of human non-small cell lung carcinoma is mediated by BCL2. Mol Cancer Ther. 9(6):1697-708. Teichberg VI, Silman I, Beitsch DD, Resheff G. (1975) A beta-D-galactoside binding protein from electric organ tissue of Electrophorus electricus. Proc Natl Acad Sci U S A. 72(4):1383-7. Than NG, Romero R, Goodman M, Weckle A, Xing J, Dong Z, Xu Y, Tarquini F, Szilagyi A, Gal P, Hou Z, Tarca AL, Kim CJ, Kim JS, Haidarian S, Uddin M, Bohn H, Benirschke K, Santolaya-Forgas J, Grossman LI, Erez O, Hassan SS, Zavodszky P, Papp Z, Wildman DE. (2009) A primate subfamily of galectins expressed at the maternal-fetal interface that promote immune cell death. Proc Natl Acad Sci U S A. 106(24):9731-6. Thijssen VL, Postel R, Brandwijk RJ, Dings RP, Nesmelova I, Satijn S, Verhofstad N, Nakabeppu Y, Baum LG, Bakkers J, Mayo KH, Poirier F, Griffioen AW. (2006) 244 Galectin-1 is essential in tumor angiogenesis and is a target for antiangiogenesis therapy. Proc Natl Acad Sci U S A. 103(43):15975-80. Tomlins SA, Mehra R, Rhodes DR, Cao X, Wang L, Dhanasekaran SM, KalyanaSundaram S, Wei JT, Rubin MA, Pienta KJ, Shah RB, Chinnaiyan AM. (2007) Integrative molecular concept modeling of prostate cancer progression. Nat Genet. 39(1):41-51. Toruner GA, Ulger C, Alkan M, Galante AT, Rinaggio J, Wilk R, Tian B, Soteropoulos P, Hameed MR, Schwalb MN, Dermody JJ. (2004) Association between gene expression profile and tumor invasion in oral squamous cell carcinoma. Cancer Genet Cytogenet. 154(1):27-35. Toscano MA, Bianco GA, Ilarregui JM, Croci DO, Correale J, Hernandez JD, Zwirner NW, Poirier F, Riley EM, Baum LG, Rabinovich GA. (2007) Differential glycosylation of TH1, TH2 and TH17 effector cells selectively regulated susceptibility to cell death. Nat Immunology. 8(8):825-34. Valenzuela HF, Pace KE, Cabrera PV, White R, Porvari K, Kaija H, Vihko P, Baum LG. (2007) O-glycosylation regulates LNCaP prostate cancer cell susceptibility to apoptosis induced by galectin-1. Cancer Res. 67(13):6155-62. van den Brûle F, Califice S, Castronovo V. (2004) Expression of galectins in cancer: a critical review. Glycoconj J. 19(7-9):537-42. Varambally S, Yu J, Laxman B, Rhodes DR, Mehra R, Tomlins SA, Shah RB, Chandran U, Monzon FA, Becich MJ, Wei JT, Pienta KJ, Ghosh D, Rubin MA, Chinnaiyan AM. (2005) Integrative genomic and proteomic analysis of prostate cancer reveals signatures of metastatic progression. Cancer Cell. 8(5):393-406. Viviani Anselmi C, Novelli V, Roncarati R, Malovini A, Bellazzi R, Bronzini R, Marchese G, Condorelli G, Montenero AS, Puca AA. (2008) Association of rs2200733 at 4q25 with atrial flutter/fibrillation diseases in an Italian population. Heart. 94:1394-1396. Vray B, Camby I, Vercruysse V, Mijatovic T, Bovin NV, Ricciardi-Castagnoli P, Kaltner H, Salmon I, Gabius HJ, Kiss R. (2004) Up-regulation of galectin-3 and its ligands by Trypanosoma cruzi infection with modulation of adhesion and migration of murine dendritic cells. Glycobiology. 14(7):647-57. Vyakarnam A, Dagher SF, Wang JL, Patterson RJ. (1997) Evidence for a role for galectin-1 in pre-mRNA splicing. Mol Cell Biol. 17(8):4730-7. 245 Wachi S, Yoneda K, Wu R. (2005) Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues. Bioinformatics. 21(23):4205-8. Waddell N, Arnold J, Cocciardi S, da Silva L, Marsh A, Riley J, Johnstone CN, Orloff M, Assie G, Eng C, Reid L, Keith P, Yan M, Fox S, Devilee P, Godwin AK, Hogervorst FB, Couch F; kConFab Investigators, Grimmond S, Flanagan JM, Khanna K, Simpson PT, Lakhani SR, Chenevix-Trench G. (2010) Subtypes of familial breast tumours revealed by expression and copy number profiling. Breast Cancer Res Treat. 123(3):661-77. Wang S, Zhan M, Yin J, Abraham JM, Mori Y, Sato F, Xu Y, Olaru A, Berki AT, Li H, Schulmann K, Kan T, Hamilton JP, Paun B, Yu MM, Jin Z, Cheng Y, Ito T, Mantzur C, Greenwald BD, Meltzer SJ. (2006) Transcriptional profiling suggests that Barrett's metaplasia is an early intermediate stage in esophageal adenocarcinogenesis. Oncogene. 25(23):3346-56. Wang Y, Leung FC. (2004) An evaluation of new criteria for CpG islands in the human genome as gene markers. Bioinformatics. 20(7):1170-7. Welsh JB, Sapinoso LM, Su AI, Kern SG, Wang-Rodriguez J, Moskaluk CA, Frierson HF Jr, Hampton GM. (2001a) Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res. 61(16):5974-8. Welsh JB, Zarrinkar PP, Sapinoso LM, Kern SG, Behling CA, Monk BJ, Lockhart DJ, Burger RA, Hampton GM. (2001b) Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc Natl Acad Sci U S A. 98(3):1176-81. Xu XC, el-Naggar AK, Lotan R. (1995) Differential expression of galectin-1 and galectin-3 in thyroid tumors. Potential diagnostic implications. Am J Pathol. 147(3):81522. Xu XC, Sola Gallego JJ, Lotan R, El-Naggar AK. (2000) Differential expression of galectin-1 and galectin-3 in benign and malignant salivary gland neoplasms. Int J Oncol. 17(2):271-6. Yamada Y, Kato K, Oguri M, Yoshida T, Yokoi K, Watanabe S, Metoki N, Yoshida H, Satoh K, Ichihara S, Aoyagi Y, Yasunaga A, Park H, Tanaka M, Nozawa Y. (2008) Association of genetic variants with atherothrombotic cerebral infarction in Japanese individuals with metabolic syndrome. Int J Mol Med. 21(6):801-8. 246 Yamauchi A, Kontani K, Kihara M, Nishi N, Yokomise H, Hirashima M. (2006) Galectin-9, a novel prognostic factor with antimetastatic potential in breast cancer. Breast J. 12(5 Suppl 2):S196-200. Yang RY, Hsu DK, Liu FT. (1996) Expression of galectin-3 modulates T-cell growth and apoptosis. Proc Natl Acad Sci U S A. 93(13):6737-42. Yang RY, Rabinovich GA, Liu FT. (2008) Galectins: structure, function and therapeutic potential. Expert Rev Mol Med. 10:e17. Yoshii T, Fukumori T, Honjo Y, Inohara H, Kim HR, Raz A. (2002) Galectin-3 phosphorylation is required for its anti-apoptotic function and cell cycle arrest. J Biol Chem. 277(9):6852-7. Yu F, Finley RL Jr, Raz A, Kim HR. (2002) Galectin-3 translocates to the perinuclear membranes and inhibits cytochrome c release from the mitochondria: a role for synexin in galectin-3 translocation. J Biol Chem. 277(18):15819-27. Yu YP, Landsittel D, Jing L, Nelson J, Ren B, Liu L, McDonald C, Thomas R, Dhir R, Finkelstein S, Michalopoulos G, Becich M, Luo JH. (2004) Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. J Clin Oncol. 22(14):2790-9. Zou J, Glinsky VV, Landon LA, Matthews L, Deutscher SL. (2005) Peptides specific to the galectin-3 carbohydrate recognition domain inhibit metastasis-associated cancer cell adhesion. Carcinogenesis. 26(2):309-18. Zuberi RI, Frigeri LG, Liu FT. (1994) Activation of rat basophilic leukemia cells by epsilon BP, an IgE-binding endogenous lectin. Cell Immunol. 156(1):1-12. Zuberi RI, Hsu DK, Kalayci O, Chen HY, Sheldon HK, Yu L, Apgar JR, Kawakami T, Lilly CM, Liu FT. (2004) Critical role for galectin-3 in airway inflammation and bronchial hyperresponsiveness in a murine model of asthma. Am J Pathol. 165(6):204553.