Genetic Analysis of Quality Traits for Food-Grade Soybean (Glycine max L. Merr.) in a Breeding Population THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Mao Huang Graduate Program in Horticulture and Crop Science The Ohio State University 2012 Master's Examination Committee: Dr. Leah K. McHale, Advisor Dr. David Francis Dr. M. A. Rouf Mian Dr. Gonul Kaletunc Copyrighted by Mao Huang 2012 Abstract Food-grade soybeans are a specialty crop with unique chemical and physical seed quality requirements. The tofu production market requires large, round soybean seed with a clear hilum and high protein content. The aim of this project was to detect QTL responsible for traits important in food-grade soybeans, and to provide information for the selection of food-grade soybeans in a breeding program. This study assessed seed shape, size, density, weight, protein content, and oil content in two independent populations of breeding lines and cultivars as well as textural traits of the tofu produced from a subset of those. By identifying the relationship between seed traits and tofu texture, the previously established positive correlation between seed protein content and the stiffness and gel strength of tofu were confirmed. Little to no correlations between seed size and shape measurements and the texture of tofu were detected, indicating that this consumer preference for large, round seeds may not relate to tofu quality. The effects of selection for specialty traits on the genetic differentiation within a breeding program were assessed, results shown that both parental and progeny selection contributed to genetic differentiation and population structure. Association mapping was conducted with 504 markers in a population of 242 breeding lines and cultivars, leading to the identification of 50 significant marker-trait associations. Of the 27 significant ii associations tested in another independent confirmation population consisting of 152 lines or cultivars, 12 were confirmed. These results can be directly applied to increase selection efficiency in a breeding program. iii Dedication To all who I love and who have been supporting me and pouring their love into my life! iv Acknowledgments I want to sincerely first thank my adviser Dr. Leah K McHale for her teaching and guidance. I am very grateful that I have joined in her lab for the past over two years. Her profound knowledge in plant breeding has helped me gradually develop my interests in this area. I have learnt from Dr. McHale not only the knowledge and skills, but also the optimistic attitude towards science research. Most importantly, her good personal characters have affected me, as well as all other lab members, in enjoying cheerful lab work atmosphere. Dr. McHale has always been very supportive and encouraging to her students. I feel honored to be her student and lucky to share a valuable path of life with Dr. McHale. I want to thank all my committee teachers: Dr. David Francis, Dr. Gonul Kaletunc, and Dr. Rouf Mian. They have all been very patiently teaching me and supporting this project. Dr. David Francis especially has invested his time in showing me to learn how to be a better researcher. I am thankful that they have been providing me valuable comments and suggestions for this project and have been generously spending their time for this dissertation. I thank Dr. Steve St. Martin wholeheartedly for his dedicated teaching. Dr. Steve St. Martin has been providing patient guidance to help me better understand this v project. I want to also thank all my lab members and the student workers that have been supporting and encouraging me with this project: Dr. Veena Ganeshan, Jeesica Schwartz, Qianli Shen, Kamila Rezende Dazio de Souza, Brad Snyder and our previous lab member Sumin Lee are good friends to work with; I have learnt different things from each of them. I also want to thank Rhiannon Schneider, Elizabeth Baskin, Christine Dubler, Angela Parker, and our lab manager Amanda Gutek for spending their time helping me with my experiments. Besides, I want to thank Dr. Asela Wijeratne, Jody Whittier from MCIC, and Dr. Stephen Opiyo for their help with the genotyping process. I thank my beloved mum, dad and all my other family members. They have been strongly supportive to me far cross the ocean from China. Their unconditional love is the strongest source of my strength! I appreciate the short internship experience in Pioneer to allow me to learn and broaden my experience towards industrial production; I want to thank my internship boss and colleagues who have been encouraging me and teaching me: Joseph Stull, Ryan Morrison, John Woods, Paula Burkholder, Goran Synic, and David Whitaker. Many thanks also go to my wonderful friends and classmates that have been providing me the support in the past over two years: Bruce and Karen Messenger, vi Tim and LaRonda Stauffer, Mark and Amy Newmeyer, Karen Oliva, Rich Mendola, Daniel Thomas, Leighton Buntain, Andika Gunadi, Jose Mendoza, Lisa Friedberg, Xuan Zhu, Xueqing Geng, Ruiqiang Liu, Yanru Liang, Dan Liu, Lingzhi Li, Beizhen Hu, Zhifen Zhang, Yuting Chen, and Lin Jin. Also, thank Ohio Soybean Council for funding support! vii Vita March, 1988 ...................................................Born: Hunan Province, China 2010................................................................B.S. Agronomy, China Agricultural University 2012................................................................Internship, Pioneer Hi-Bred, Plain City, Ohio 2012 to present ...............................................Graduate Research Associate, Department of Horticulture and Crop Science, The Ohio State University Fields of Study Major Field: Horticulture and Crop Science Minor Field: Statistics viii Table of Contents Abstract ..................................................................................................................... ii Acknowledgments..................................................................................................... v Vita......................................................................................................................... viii List of Tables .......................................................................................................... xii List of Figures ........................................................................................................ xiii Literature review ....................................................................................................... 1 Introduction ........................................................................................................... 1 The importance of specific traits in food-grade soybeans ................................. 2 Heritabilities and loci for food-grade traits ....................................................... 3 Methods and tools for genetic analysis of traits in soybean .............................. 5 Association mapping ......................................................................................... 6 Objectives .............................................................................................................. 8 Tables and figures ............................................................................................... 10 References ........................................................................................................... 11 Chapter 2: Correlations of seed traits with tofu characteristics in 49 soybean cultivars and breeding lines .................................................................................... 16 Introduction ......................................................................................................... 17 Materials and Methods ........................................................................................ 20 Seed material ................................................................................................... 20 Seed measurements.......................................................................................... 21 Tofu production ............................................................................................... 21 Textural analysis of tofu .................................................................................. 22 Statistical Analyses .......................................................................................... 23 Results ................................................................................................................. 24 Seed measurements.......................................................................................... 24 Textural qualities of tofu ................................................................................. 25 ix Correlations between textural traits of tofu and seed measurements .............. 26 Discussion ........................................................................................................... 27 Tables and Figures .............................................................................................. 31 References ........................................................................................................... 37 Chapter 3: Analysis of population structure in a soybean breeding program for commodity and specialty types ............................................................................... 41 Introduction ......................................................................................................... 42 Materials and Methods ........................................................................................ 45 Plant Populations and DNA Isolation .............................................................. 45 Collection of Genotypic Data .......................................................................... 47 Collection of Phenotypic Data ......................................................................... 47 Statistical Analysis of Phenotypic Data........................................................... 48 Analysis of Population Substructure ............................................................... 49 Results ................................................................................................................. 50 Genotypic data ................................................................................................. 50 Population structure ......................................................................................... 50 Effect of pedigree on population structure ...................................................... 51 Effect of phenotypic selection on population structure ................................... 52 Differentiation of phenotypes among populations and groups ........................ 53 Discussion ........................................................................................................... 54 Tables and figures ............................................................................................... 58 References ........................................................................................................... 70 Chapter 4: Association mapping of food-grade quality traits in a soybean breeding program for commodity and food-grade cultivars ................................... 74 Introduction ......................................................................................................... 76 Materials and Methods ........................................................................................ 79 Initial population.............................................................................................. 79 Confirmation population.................................................................................. 79 Phenotypic data collection ............................................................................... 79 Statistical analysis of phenotypes .................................................................... 80 Genotypic data collection ................................................................................ 80 x Association mapping ....................................................................................... 81 Results ................................................................................................................. 82 Discussion ........................................................................................................... 84 Tables and Figures .............................................................................................. 87 References ........................................................................................................... 96 Bibliography ......................................................................................................... 100 xi List of Tables Table 1.1. Positional information for oil, protein content and seed size QTL. ...... 10 Table 2.1. Phenotyped cultivars and breeding lines. .............................................. 31 Table 2.2. Variance for tofu textural traits. ............................................................ 32 Table 2.3. Pearson’s correlation coefficient between textural traits of tofu. ......... 32 Table 2.4. Pearson’s correlation coefficient between tofu textural measurements and seed measurements. .......................................................................................... 33 Table 3.1. Pedigrees, locations, groups and population membership for cultivars and breeding lines. .................................................................................................. 58 Table 3.2. Correlation coefficients for subpopulations and PCs and mean PC values for groups. .......................... ………………………………………………..63 Table 3.3. Eigenvectors for all of the phenotypic variables contributing to phenotypic PC1 and PC2. ....................................................................................... 64 Table 4.1. Confirmation population breeding lines and their pedigrees as well as lines used as checks. ............................................................................................... 87 Table 4.2. Proportion of the observed phenotypic variance (σp2) explained by genetic variance for BLUP values of seed traits in the initial and confirmation populations. ............................................................................................................. 90 Table 4.3. Significant marker-trait associations. .................................................... 91 xii List of Figures Figure 2.1. Workflow of tofu production. .............................................................. 33 Figure 2.2. Force-deformation plot of tofu penetration analysis ........................... 34 Figure 2.3. Histograms of LSmeans for tofu texture traits and seed traits. ........... 35 Figure 2.4. Scatter plots comparing LSmeans of protein and oil content. ............. 35 Figure 2.5. Scatter plots comparing LSmeans of tofu textural traits. .................... 36 Figure 2.6. Scatter plots comparing LSmeans of tofu textural traits versus seed traits......................................................................................................................... 36 Figure 3.1. Plot of ∆K. ........................................................................................... 65 Figure 3.2. Bayesian admixture proportion for individual soybean lines with the K = 2 population model. ......................................................................................... 66 Figure 3.3. PCA plot of genotypic data. ................................................................ 67 Figure 3.4. Bar graph displaying the percentage of alleles attributed to the major subpopulation for each group defined on the basis of pedigree or phenotype. ............................................................................................................... 68 Figure 3.5. PCA plots of phenotypic data. ............................................................. 69 Figure 4.1 LD decay for the initial mapping population........................................ 93 Figure 4.2. Manhattan plots of the MLM result for marker associations with seed traits. ............................................................................................................... 94 Figure 4.3. Display of LD selected chromosomes. ................................................ 95 xiii CHAPTER 1 LITERATURE REVIEW INTRODUCTION Soybean (Glycine max L. Merr.) is the second largest crop in the U.S (Zhang et al., 2010). In 2009, soybean had an estimated planting area of 77.45 million acres, which is the highest recorded in the U.S since 1960 (National Agricultural Statistics Service, 2010). As a commodity seed crop with high oil and protein content, soybean is primarily used as a source of vegetable oil and protein meal. A small but increasing portion of soybean acreage is devoted to food-grade soybeans, the raw material for making tofu, miso, edamame, soymilk, soy sauce, natto and tempeh. The successful breeding of foodgrade soybeans will benefit farmers and consumers, as well as the Ohio economy, where soybean production largely contributor (Ohio Soybean Council, 2011). Soy-food demand is growing, partly due to the nutraceutical value of soybean (Rao et al., 2002), as well as its value as inexpensive source of protein (Cheng et al., 1990; Hong et al., 2004). The identification of molecular markers associated with food grade traits can enhance the selection of qualified soybean cultivars. However, genetic information on many food-grade soybean traits is currently limited (Shi et al., 2010). Seed traits for which quantitative trait loci (QTL) analyses have been conducted have 1 been primarily limited to seed protein, oil content, and seed weight. A few QTL have been identified for seed shape (Salas et al., 2006) and no QTL have been detected for the quality of produced tofu or the water up take efficiency of seed. The importance of specific traits in food-grade soybeans Traits important for food-grade soybean cultivars differ from that of commodity soybean cultivars where breeders are ultimately selecting for yield above all other traits (Nichols et al., 2006). Traits important in food-grade soybean cultivars include a collection of specific features, such as seed shape, size, color, efficiency of water uptake, and protein and oil content (Poysa et al., 2002). The total content and composition of seed protein largely determine tofu yields and texture. Harovinton, the Canadian quality standard tofu-type soybean cultivar, contains approximately 44% of protein content (Poysa et al., 2002; Poysa et al., 2006). The texture of tofu can be enhanced by A3 glycinin subunits, which also contributes to tofu firmness (Poysa et al., 2006). Large, round-shaped seeds are suitable for making tofu, edamame, miso and soymilk, while smaller seeds are used in the production of natto (Poysa et al., 2002; Zhang et al., 2010). A low surface-to-volume ratio reduces the amount of materials lost as residuals during tofu processing, thus large round seeds are desirable for tofu producing soybeans (Poysa et al., 2002). In contrast, small seeds with high water uptake capacity, high carbohydrate content and low oil content are conducive to the fermentation process of natto type soybeans (Mullin and Xu, 2001; Wei and Chang, 2004). For tofu production, a colorless seed coat and hilum is preferred; however, a yellow seed coat and hilum is acceptable, 2 based on the characteristics of Harovinton (Poysa et al., 2002; Poysa et al., 2006). For natto production, pale yellow and smooth seed coat is preferred (Wei and Chang, 2004). Large seeds are desirable for miso and edamame production (Salas et al., 2006). Heritabilities and loci for food-grade traits Soybean oil and protein content are important and heritable traits, which have been extensively studied (Diers et al., 1992; Chung et al., 2003; Clemente and Cahoon, 2009; Bolon et al., 2010). The broad-sense heritability for protein and for oil content has been reported as 0.84 and 0.91 in a recombinant inbred line (RIL) population with 131 F6-dereived lines (Hyten et al., 2004). There is a well-established negative correlation between oil and protein content in soybean seed (Liang et al., 2010). In contrast to protein and oil content and seed weight, knowledge of the heritabilities of other traits such as soybean seed shape, water uptake and firmness of tofu is lacking. Soybean product quality such as tofu firmness is related to and can be influenced by seed protein content (Poysa et al., 2006). Seed shape is important trait for food-grade soybean (Salas et al., 2006; Xu et al., 2011). Seed shape is determined by seed length (SL), seed width (SW) and seed height (SH). The estimated heritability for SL:SW and SL:SH ratios range from 0.59 to 0.79. There is no correlation between SL:SW, SL:SH ratios and cross sectional area (Cober et al., 1997). The narrow-sense heritability in three recombinant inbred line (RIL) populations range from moderate to high with heritabilities of 0.72 to 0.83 for SH, 0.42 to 0.88 for SW, 0.58 to 0.85 for SL, and 0.44 to 0.88 for calculated seed volume SL × SW × SH (Salas et al., 2006). Broad-sense 3 heritability in three bi-parental populations for seed weight is high, with values ranging from 0.76 to 0.93 (Hoeck et al., 2003). The estimated heritability for water uptake at 16h and the water uptake best fit curve have been reported to be moderate at 0.36 and 0.42, respectively (Cober et al., 2006) indicating the potential to select for improvement in high final water uptake in natto soybean (Cober et al., 2006). Broad sense heritability for firmness of tofu has been reported as 0.37 and 0.61 in two different experiments (Aziadekey et al., 2002). QTL for protein, oil content, and seed weight, have been widely detected (Diers et al., 1992; Chung et al., 2003; Jun et al., 2008; Teng et al., 2008; Bolon et al., 2010). Up to 2004, the position and effects of previously published QTL responsible for seed protein and oil content and seed weight have been summarized by Hyten et al. (2004). QTL for protein and oil content after 2004 were summarized in this proposal (Table 1; Panthee et al., 2005; Liang et al., 2010; Shi et al., 2010). To date, 124 QTL for oil, 108 QTL for protein, 2 QTL for Oil/protein, and 119 QTL for seed weight have been published in Soybase (Soybase, 2012). QTL for other traits like seed shape and water uptake efficiency have been less studied (Salas et al., 2006; Xu et al., 2011). In a single study, 26 QTL were detected for seed shape in three densely mapped RIL populations grown in two environments (Salas et al., 2006). There have been no QTL published for water uptake efficiency. In the present study, detection of QTL included seed weight, protein and oil content, but also focused on seed shape, seed size (volume), water uptake efficiency and firmness of tofu. Among all the traits, the firmness of tofu is depended on a large number 4 of factors, such as total protein content and the relative abundance of protein subunits (Poysa et al., 2006). While these factors may result in a lack of statistical power to identify marker-trait associations the resulting data will be useful to identifying lines suitable for food-grade cultivars as well as identifying extreme lines for tofu firmness which may be candidates for a future more thorough evaluation of seed and protein composition. Methods and tools for genetic analysis of traits in soybean The exploration of quantitative traits has been a major area of genetic study for over a century. Knowledge of which genes and/or quantitative trait loci (QTL) are responsible for food-grade soybean traits will facilitate the efficient combination of traits in the development of new cultivars. This process is assisted by the use of molecular markers to construct linkage maps and genetic analysis of complex quantitative traits. Marker assisted selection together with traditional breeding methods facilitate new cultivar development. Recently, large numbers of molecular markers have been developed and made available for soybean researchers. An integrated genetic linkage map of soybean was released in 1999 with a total of 606 simple sequence repeat (SSR) markers, mapped in one or more of three populations (Cregan et al., 1999). A new integrated genetic linkage map of soybean was published in 2004 with the addition of 420 new SSRs to the previous map and with total of 1015 SSR markers (Song et al., 2004). The addition of 1141 transcript genes were mapped by single nucleotide polymorphism (SNP) markers provided a valuable resources to soybean breeders with increased number 5 of sequence-based markers (Choi et al., 2007). The most current version of the soybean integrated linkage map was developed by adding 2500 SNP markers with the Illumina GoldenGate Assay, which has proven to be a successful tool in the high-throughput SNP genotyping of soybean (Hyten et al., 2008; Hyten et al., 2010). A total of 5500 SNP markers were included in the current consensus map of soybean (Hyten et al., 2010). This most recent addition of Illumina GoldenGate SNP markers have been organized into a panel of 1536 markers selected to be informative for genetic analyses on a wide variety of soybean populations (Hyten et al., 2010). These markers were identified by sequencing amplicons from five parental lines from which 2072 bi-allelic SNPs were selected. Adding another 500 SNPs by Choi et al. (2007), a total of 3072 SNP markers were completed for two set of 1536 custom GoldenGate Assay (Hyten et al., 2010). These two panels were assayed on a diverse array soybean accessions in order select a single, optimally informative set of 1536 SNP markers. The published Universal Soy Linkage Panel of 1536 SNPs by Hyten et al. (2010) was used in the proposed research. Association mapping As described above, marker-trait associations are commonly identified via QTL mapping in a bi-parental population. The populations studied for soybean linkage mapping have predominantly been bi-parental RIL populations derived from crosses between Minsoy, Noir 1, and Archer (Mansur et al., 1996; Cregan et al., 1999; Song et al., 2004; Hyten et al., 2008; Hyten et al., 2010). However, to use conventional bi-parental mapping, the genetic resolution can be limited by the restricted opportunities for 6 recombination during population development. In addition, bi-parental mapping can be limited by the time and effort required to create and advance large populations; as a result, many published bi-parental mapping populations have relatively small population sizes (Yu and Buckler, 2006; Zhu et al., 2008; Jun et al., 2008). In relation to a breeding program, the primary limitation of bi-parental mapping populations is that the values of QTL tend to be contextual, relying solely on the two alleles present in the parents. However, association mapping can be conducted within a much broader germplasm context, taking advantage of existing individuals, historical recombination events, and the evaluation of a large number of alleles in a single population (Yu and Buckler, 2006; Zhu et al., 2008). The use of association mapping to identify QTL responsible for complex traits has been of great interest in plant breeding research (Zhu et al., 2008). The proposed research will conduct association mapping in a multi-parent inter-crossed breeding population; therefore, the results will be immediately applicable to the Ohio public soybean breeding. In recent years, association mapping has been successfully used to detect QTL in soybean and other crops. Two new QTL for protein were detected by using a combination of selective genotyping and association mapping with 150 simple sequence repeat (SSR) markers in a soybean population of accessions from Korea, Japan and China (Jun et al., 2008). By using association mapping, two QTL were confirmed for iron deficiency chlorosis in soybean in two independent populations with 139 lines and 115 advanced breeding lines (Wang et al., 2008). In other crops such as lettuce, association mapping has been used in a population representative of the diversity in cultivated lettuce 7 to assist in pinpointing the resistance gene Tvr1 (Simko et al., 2009). Association mapping was also used to identify previously known flowering time and pathogen resistance loci in a population representative of the diversity within the model plant Arabidopsis (Aranzana et al., 2005). To detect QTL for kernel size and milling quality in wheat, association mapping was used in a population representative of elite soft winter wheat cultivars grown in the eastern U.S (Breseghello and Sorrells, 2006). In rice, association mapping was performed in a population of core rice cultivars collected by USDA to detect genes for stigma and spikelet characteristics (Yan et al., 2009). In maize, association mapping has been conducted in the nested association mapping population to detect genes responsible for leaf architecture traits (Tian et al., 2011). These studies used populations ranging in size from 68 accessions to ~5,000 lines. Markers were either SSRs or SNPs, or a combination of SSRs with indels; the number of markers in each study ranged from 62 to 1.6 million. Due to the nature of the populations that can be used, association mapping is a technique which can identify maker-trait associations that can be immediately applicable to breeding programs. OBJECTIVES The aim of this project was to detect QTL responsible for traits important in foodgrade soybeans, and to provide information for the selection of food-grade soybeans in a breeding program. The objectives included: 1) determining the relationship between the 8 quality of tofu produced and seed characters, including shape, size, density, weight, protein content, and oil content; 2) determining the effect of selection for multiple specialty types on the population structure of breeding population; 3) determining the values, variation, and estimate the heritability for soybean seed quality traits including seed oil and protein content, seed weight, seed shape and firmness of produced tofu; 4) identifying marker trait associations through association mapping. 9 TABLES AND FIGURES QTL Seed size (weight) Oil content Oil content Protein content Oil content Oil content Oil content Oil content Oil content Oil content Oil content Oil content Oil content Protein content Oil content Protein content Map Linkage Marker Chr. position group (cM) LOD No. of environ. 17.52 11.3 3.50 6 157.08 NR† 5.13 2 157.21 NR 5.01 2 Parent 1 Parent 2 1 A1 5 A1 5 C2 6 Satt281 40.3 0.42 NR 2 C2 6 Satt363- 91.87 NR 2.17 2 C2 6 Satt277 93.92 NR 2.25 2 A2 8 Satt162- 14.18 NR 12.61 2 A2 8 BSC 14.21 NR 12.52 2 A2 8 Satt187- 23.24 NR 11.81 2 A2 8 Satt129 25.31 NR 11.02 2 O 10 Satt358 5.44 0.24 NR 2 O 10 Satt479 54.2 12 3.10 6 B1 11 Satt453 123.96 0.14 NR 2 B1 11 Satt453 123.96 0.23 NR 2 F 13 Satt146 1.92 0.4 NR 2 F 13 Satt146 1.92 0.36 NR 2 Protein content F 13 Satt586 3.63 0.19 NR 2 54 US & 51 Asian breeding lines Oil content B2 14 Satt063- 45.14 NR 3.69 2 Jindou 23 B2 14 Satt070 45.21 NR 3.62 2 Jindou 23 D2 17 Satt002 47.73 10 2.90 6 G 18 Satt570 12.21 20.2 3.50 6 Satt225Satt599 Satt225Satt599 References N87-984Panthee et al., TN93-99 16 2005 Huibu Liang et al., Jindou 23 zhi 2010 Huibu Liang et al., Jindou 23 zhi 2010 54 US & 51 Asian Shi et al., 2010 breeding lines Huibu Liang et al., Jindou 23 zhi 2010 Huibu Liang et al., Jindou 23 zhi 2010 Huibu Liang et al., Jindou 23 zhi 2010 Huibu Liang et al., Jindou 23 zhi 2010 Huibu Liang et al., Jindou 23 zhi 2010 Huibu Liang et al., Jindou 23 zhi 2010 54 US & 51 Asian Shi et al., 2010 breeding lines N87-984Panthee et al., TN93-99 16 2005 54 US & 51 Asian Shi et al., 2010 breeding lines 54 US & 51 Asian Shi et al., 2010 breeding lines 54 US & 51 Asian Shi et al., 2010 breeding lines 54 US & 51 Asian Shi et al., 2010 breeding lines D1a Oil content Seed size (weight) Protein content Satt184 r2 Huibu zhi Shi et al., 2010 Liang et al., 2010 Huibu zhi Liang et al., 2010 N87-984Panthee et al., TN93-99 16 2005 N87-984Panthee et al., TN93-99 16 2005 Table 1.1. Positional information for oil, protein content and seed size QTL not summarized elsewhere (Hyten et al., 2004). †NR: not reported 10 REFERENCES American Soybean Association. Soy Stats 2012. http://www.soystats.com/2012. Retrieved November 6, 2012. Aranzana MJ, Kim S, Zhao K, Bakker E, Horton M, Jakob K, Lister C, Molitor J, Shindo C, Tang C, Toomajian C, Traw B, Zheng H, Bergelson J, Dean C, Marjoram P, Nordborg M (2005) Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes. PLoS Genetics. 1: 531-539. Aziadekey M, Schapaugh WT, Herald TJ (2002) Genotype by environment interaction for soymilk and tofu quality characteristics. Journal of Food Quality. 25: 243-259. Breseghello F, Sorrells ME (2006) Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics. 172: 1165-1177. Bolon Y-T, Joseph B, Cannon SB, Graham MA, Diers BW, Farmer AD, May GD, Muehlbauer GJ, Specht JE, Tu ZJ, Weeks N, Xu WW, Shoemaker RC, Vance CP (2010) Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean. BMC Plant Biology. 10: 41-64. Cheng YJ, Thompson LD, Brittin HC (1990) Sogurt, a yogurt-like soybean product development and properties. Journal of Food Science. 55: 1178-1179. Choi I-Y, Hyten DL, Matukumalli LK, Song Q, Chaky JM, Quigley CV, Chase K, Lark KG, Reiter RS, Yoon M-S, Hwang E-Y, Yi S-I, Young ND, Shoemaker RC, van Tassell CP, Specht JE, Cregan PB (2007) A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis. Genetics. 176: 685–696. Chung J, Babka HL, Graef GL, Staswick PE, Lee GJ, Cregand PB, Shoemaker RC, Specht JE (2003) The seed protein, oil, and yield QTL on soybean linkage group I. Crop Science. 43: 1053–1067. Clemente TE, Cahoon EB (2009) Soybean oil: genetic approaches for modification of functionality and total content. Plant Physiology. 151: 1030-1040. Cober ER, Voldeng HD, Fregeau-Reid JA (1997) Heritability of Seed Shape and Seed Size in Soybean. Crop Science. 37: 1767-1769. 11 Cober ER, Fregeau-Reid JA, Butler G, Voldeng HD (2006) Genotype–Environment analysis of parameters describing water uptake in natto soybean. Crop Science. 46: 2415-2419. Cregan PB, Jarvik T, Bush AL, Shoemaker RC, Lark KG, Kahler AL, Kaya N, VanToai TT, Lohnes DG, Chung J, Specht JE (1999) An integrated genetic linkage map of the soybean genome. Crop Science. 39: 1464-1490. Diers BW, Keim P, Fehr WR, Shoemaker RC (1992) RFLP analysis of soybean seed protein and oil content. Theoretical and Applied Genetics. 83: 608-612. Hoeck JA, Fehr WR, Shoemaker RC, Welke GA, Johnson SL, Cianzio SR (2003) Molecular marker analysis of seed size in soybean. Crop Science. 43(1): 68-74. Hong K-J, Lee C-H, Kim SW (2004) Aspergillus oryzae GB-107 fermentation improves nutritional quality of food soybeans and feed soybean meals. Journal of Medicinal Food. 7(4): 430-435. Hyten DL, Pantalone VR, Sams CE, Saxton AM, Landau-Ellis D, Stefaniak TR, Schmidt ME (2004) Seed quality QTL in a prominent soybean population. Theoretical and Applied Genetics. 109: 552–561. Hyten DL, Song Q, Choi I-Y, Yoon M-S, Specht JE, Matukumalli LK, Nelson RL, Shoemaker RC, Young ND, Cregan PB (2008) High-throughput genotyping with the GoldenGate assay in the complex genome of soybean. Theoretical and Applied Genetics. 116(7): 945-952. Hyten DL, Choi I-Y, Song Q, Specht JE, Carter TE, Shoemaker RC, Hwang E-Y, Matukumallif LK, Cregan PB (2010) A high density integrated genetic linkage map of soybean and the development of a 1536 universal soy linkage panel for quantitative trait locus mapping. Crop Science. 50: 960-968. Jun T-H. Van K, Kim MY, Lee SH, Walker DR (2008) Association analysis using SSR markers to find QTL for seed protein content in soybean. Euphytica. 162(2): 179191. Keim P, Olson TC, Shoemaker RC (1988) A rapid protocol for isolating soybean DNA. Soybean Genetics Newsletter. 15: 150-152. Liang HZ, Yu YL, Wang SF, Lian Y, Wang TF, Wei YL, Gong PT, Liu XY, Fang XJ, Zhang MC (2010) QTL Mapping of isoflavone, oil and protein contents in soybean (Glycine max L. Merr.). Agricultural Sciences in China. 9: 1108-1116. 12 Liu KS (1997) Soybeans: chemistry, technology, and utilization. New York: Chapman & Hall. Mansur LM, Orf JH, Chase K, Jarvik T, Cregan PB, Lark KG (1996) Genetic mapping of agronomic traits using recombinant inbred lines of soybean. Crop Science. 36(5): 1327-1336. Mullin WJ, Xu W (2001) Study of soybean seed coat components and their relationship to water absorption. Journal of Agricultural and Food Chemistry. 49(11): 5331-5335. Mullin WJ, Fregeau-Reid JA, Butler M, Poysa V, Woodrow L, Jessop DB, Raymond D (2001) An interlaboratory test of a procedure to assess soybean quality for soymilk and tofu production. Food Research International. 34(8): 669-677. National agricultural Statistics Service. Quick Stats. http://quickstats.nass.usda.gov. Retrieved September 7, 2011. Nichols DM, Glover KD, Carlson SR, Specht JE, Diers BW (2006) Fine mapping of a seed protein QTL on soybean linkage group I and its correlated effects on agronomic traits. Crop Science. 46(2): 834-839. Ohio Soybean Council. International Marketing. http://associationdatabase.com/aws/OHSOY/pt/sp/osc_home. Retrieved September 7, 2011. Panthee DR, Pantalone VR, West DR, Saxton AM, Sams CE (2005) Quantitative trait loci for seed protein and oil concentration, and seed size in soybean. Crop Science. 45(5): 2015-2022. Poysa V, Woodrow L (2002) Stability of soybean seed composition and its effect on soymilk and tofu yield and quality. Food Research International. 35: 337-345. Poysa V, Woodrow L, Yu K (2006) Effect of soy protein subunit composition on tofu quality. Food Research International. 39: 309-317. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics. 155: 945-959. Rao MSS, Mullinix BG, Rangappa M, Cebert E, Bhagsari AS, Sapra VT, Joshi M, Dadson RB (2002) Genotype × environment interactions and yield stability of foodgrade soybean genotypes. Agronomy Journal. 94(1): 72-80. 13 Salas P, Oyarzo-Llaipen JC, Wang D, Chase K, Mansur L (2006) Genetic mapping of seed shape in three populations of recombinant inbred lines of soybean (Glycine max L. Merr.). Theoretical and Applied Genetics. 113(8): 1459-1466. Shi A, Chen P, Zhang B, Hou A (2010) Genetic diversity and association analysis of protein and oil content in food-grade soybeans from Asia and the United States. Plant Breeding. 129(3): 250-256. Simko I, Pechenick DA, McHale LK, Truco MJ, Ochoa OE, Michelmore RW Scheffler B E (2009) Association mapping and marker-assisted selection of the lettuce dieback resistance gene Tvr1. BMC Plant Biology. 9(1): 135. Song QJ, Marek LF, Shoemaker RC, Lark KG, Concibido VC, Delannay X, Specht JE, Cregan P B (2004) A new integrated genetic linkage map of the soybean. TAG Theoretical and Applied Genetics. 109(1): 122-128. Soybase. Map QTL. www.soybase.org. Retrieved November 06, 2012. Teng W, Han Y, Du Y, Sun D, Zhang Z, Qiu L, Sun G, Li W (2008) QTL analyses of seed weight during the development of soybean (Glycine max L. Merr.). Heredity. 102(4): 372-380. Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, Flint-Garcia S, Buckler ES (2011) Genome-wide association study of leaf architecture in the maize nested association mapping population. Nature Genetics. 43(2): 159-162. Wang J, McClean PE, Lee R, Goos RJ, Helms T (2008) Association mapping of iron deficiency chlorosis loci in soybean (Glycine max L. Merr.) advanced breeding lines. Theoretical and Applied Genetics. 116(6): 777-787. Wei Q, Chang SKC, Characteristics of fermented natto products as affected by soybean cultivars. Journal of Food Processing Preservation. 28: 251-273. Xu Y, Li HN, Li GJ, Wang X, Cheng LG, Zhang YM (2011) Mapping quantitative trait loci for seed size traits in soybean (Glycine max L. Merr.). Theoretical and Applied Genetics. 122(3): 581-594. Yan WG, Li Y, Agrama HA, Luo D, Gao F, Lu X, Ren G (2009) Association mapping of stigma and spikelet characteristics in rice (Oryza sativa L.). Molecular Breeding. 24(3): 277-292. Yu JM, Buckler ES (2006) Genetic association mapping and genome organization of maize. Current Opinion in Biotechnology. 17: 155-160. 14 Zhang B, Chen P, Florez-Palacios SL, Shi A, Hou A, Ishibashi T (2010) Seed quality attributes of food-grade soybeans from the US and Asia. Euphytica. 173(3): 387-396. Zhu CS, Gore M, Buckler ES, Yu J (2008) Status and prospects of association mapping in plants. The Plant Genome. 1(1): 5-20. 15 CHAPTER 2 CORRELATIONS OF SEED TRAITS WITH TOFU CHARACTERISTICS IN 48 SOYBEAN CULTIVARS AND BREEDING LINES Abstract In comparison to commodity or field-grade soybeans [Glycine max (L.) Merr], foodgrade soybeans, used to produce tofu and other soy-food products, have specific seed composition, shape, size and color requirements. Many of these seed qualities, such as protein content, have been correlated with the tofu texture. To study these correlations, tofu was produced from 48 high protein or food-grade type soybean cultivars and breeding lines grown in two locations. Four tofu textural traits were assessed: work to break, brittleness, stiffness, and gel strength. Seed traits measured included seed protein content, oil content, weight, volume, density, and shape. Correlation analysis was conducted between tofu texture quality and soybean seed traits. Seed protein and oil are both detected to be significantly correlated with the above tofu textural traits with the exception of brittleness. No significant correlations between tofu textural traits and seed volume or shape were detected, implying that the preference for large, round seeds by tofu producers is unrelated to texture of the tofu produced under these conditions. 16 INTRODUCTION As an inexpensive high protein source, tofu demands are increasing (Dimitri and Greene, 2002; Chianu et al., 2010; Yamaura, 2011). The textural qualities of tofu are important for consumer acceptance as well as the marketing classification or type of tofu (Golbitz et al., 2006). Many factors can affect tofu texture, these include external factors associated with the tofu making process as well as factors which are intrinsic to the seed and may be associated with the soybean cultivar or seed production environment (Min et al., 2005; Kumar et al., 2006). Instrumental measurements of tofu textural properties have been correlated with the descriptive sensory scores of tofu quality by trained panelists (Yuan and Chang, 2007). Thus, instrumental texture units have become a standard method of reporting the textural quality of tofu (e.g. Poysa and Woodrow, 2002; Mujoo et al., 2003; Liu and Chang, 2004). External factors affecting tofu quality include the soymilk processing, coagulant properties, other additives, and the pressure applied to form the tofu curd (Gandhi and Bourne, 1988; Sun and Breene, 1991; Johnson and Wilson, 1984; Cai et al., 1997; Hou et al., 1997; Cai and Chang, 1998; Cai and Chang, 1999; Liu and Chang, 2004). In terms of processing the soymilk, tofu yield and texture are affected by the temperature and length of time used for soaking and grinding the soybean seeds, as well as the heating method of the soymilk (Johnson and Wilson, 1984; Cai and Chang, 1999; Liu and Chang, 2004). The coagulation of soymilk into curd is one of the most critical processes in tofu production. The type of coagulant used, its concentration, and the method and timing of 17 its incorporation all influence the texture of tofu produced (Sun and Breene, 1991; Cai et al., 1997; Hou et al., 1997; Cai and Chang, 1998; Liu and Chang, 2004). Increased pressure applied to form tofu curd results in lower tofu moisture and yield (Gandhi and Bourne, 1988). However, springiness, cohesiveness, adhesiveness, and stringiness are only minimally affected by the processing pressure (Gandhi and Bourne, 1998). In addition, soymilk additives, such as transglutaminase can increase tofu firmness (Kwan and Easa, 2003; Yasir et al., 2006). Factors intrinsic to the seed which may affect tofu yield and/or texture include, among a number of factors, the seed protein content and composition (Cai et al., 1997; Cai and Chang, 1999; Poysa and Woodrow, 2002; Shih et al., 2002; Mujoo et al., 2003; Poysa et al., 2006). Tofu is produced from what is known as food-grade soybeans, which are characterized by having a clear or yellow hilum, high protein content, and a large, round seed shape (Graef and Specht, 1989; Griffis and Wiederman, 1990). Food-grade soybeans are distinct from the commodity or field-grade seed, which is generally smaller, with lower protein content and displays a dark hilum and oblong shape. Higher protein content in soymilk results in a springier tofu texture, in addition to being positively correlated with tofu yield and firmness (Shih et al,. 2002; Cai et al., 1997; Poysa and Woodrow, 2002). The relative concentration of the 11S and 7S protein subunits affects tofu yield and texture; however, the affect is dependent on the tofu processing method and coagulant used (Cai and Chang, 1999; Mujoo et al., 2003; Poysa et al., 2006). Although the tofu market prefers large, round seeds (Graef and Specht, 1989), it remains unclear if seed size or shape of soybean has a functional impact on tofu quality 18 or if the preference for large, pearl-like soybean seed is related solely to tofu yield or is sociological rather than functional in origin. Reports on this subject have been inconsistent, possibly due to the many factors which may affect the texture of tofu. Studies which have used seed weight as an estimate for seed size, have concluded and found that seed size has no effects on the physicochemical properties, yield, and firmness of tofu (Lim et al., 1990; Wang and Chang, 1995; Cai et al., 1997). Alternatively, seed size, calculated from seed straight length and width, has been positively correlated to tofu yield, as well as soymilk concentration, which is directly related to tofu firmness (Shih et al., 1997; Shih et al., 2002). The aims of the present study are to determine if 1) there is a correlation between seed and shape characteristics and the textural quality of the tofu produced as well as confirm, in the present population, the known correlation between protein and tofu texture shown in previous studies, 2) to determine if the textural qualities of tofu are heritable in the present population, and 3) to determine if the genetic basis of tofu texture can be improved through indirect selection on correlated traits. To address these aims, we measured seed volume, shape, weight, and density as well as protein and oil content for seed from 48 soybean cultivars or breeding lines grown in two different environments in Ohio. Tofu was produced using a bench top methodology (Liu, 1997; Evans et al., 1997; Mullin et al., 2001) for each cultivar or breeding line from both locations. Textural quality measurements of our tofu (work to break, brittleness, stiffness and, gel strength) were correlated with the seed measurements. Genetic variation was also estimated for these traits and genetic values was estimated by best linear unbiased predictors (BLUPs) 19 were correlated. Our findings reproduce the well-established correlation between seed protein content and the textural qualities of the tofu produced, but identify little to no correlation between textural qualities and the seed size, weight, shape, and density. MATERIALS AND METHODS Seed material A collection of 250 soybean breeding lines and check cultivars were grown at two locations in Ohio, Hoytville and Wooster, in 2010. Lines were grown in trials for either maturity group (MG) II or III and in either a preliminary or advanced trial. Preliminary trials were composed of breeding lines in F4:5 generation; advanced trials were composed of breeding lines in the F4:6-8 generations. Lines for each trial were grown in a randomized complete block design with two replicates for the preliminary lines or three replicates for the advanced lines. The 25 lines with the highest protein content (see ‘Seed measurements’ section below) were selected for analysis of seed traits and for tofu production and textural analysis. In addition to these lines, check cultivars used for the food-grade market were included in the analysis, as well as breeding lines with a food-grade cultivar in their pedigree. In total, 48 lines were selected for analysis of seed traits and tofu production and textural analysis (Table 2.1). 20 Seed measurements Seed oil and protein concentration were measured from all replicates at the National Center for Agricultural Utilization Research in Peoria, IL using Near Infrared (NIR) technology with an Infratec 1255 Food & Feed Analyzer (UltraTec Manufacturing Inc., Santa Ana, CA). Seed volume was measured from a single replicate by water displacement by placing fifty dry seeds into a 50 ml graduated cylinder with 25 ml of water. Average volume per seed was determined by dividing the displaced volume by 50. Seed weight was measured with 100 seeds from each replicate with an OHAUS Adventurer TM Pro, Model AV212 scale. Average weight per seed was determined by dividing 100-seed weight by 100. The average seed weight divided by average seed volume was recorded as seed density. Seed straight length, straight width and length to width ratio measurements were conducted by scanning ten seeds from a single replicate followed by image analysis with WinSEEDLE software (Regent Instruments Inc., Canada). Tofu production Tofu firmness data were collected from the 48 selected lines (Table 2.1) from a single replicate from each location. The method of making tofu is adapted from Liu (1997), Evans et al. (1997), and Mullin et al. (2001) and is described in Figure 2.1. Seeds (100 g) from the selected lines were soaked in 250 ml water at 20-22 °C for 16 hours to uptake water to approximately 2.1 times the original seeds weight. Soaked seeds were further combined with 400 ml distilled water and blended for 3 minutes at high speed 21 (blender model 908-2, Hamilton Beach Co., Washington, NC). The bean slurry was filtered through two layers of cheese cloth and squeezed manually to obtain soymilk. The okara, or bean residual, was washed and mixed with 150 ml more water so that the final volume ratio between raw beans and total water used was estimated to be 1:8. Total soymilk obtained from the slurry (300 ml) was heated to 95°C and maintained for five minutes while vigorously stirred with a magnetic stirring bar. Heated soymilk was poured into a glass jar and allowed to cool to 87 °C at which point calcium sulfate (CaSO4 ·2H2O) was added to a concentration of 3.49 mM to the produced soymilk. The mixture was stirred vigorously for about three seconds and incubated at room temperature for 30 minutes to allow the formation of tofu curd. In order to account for the variation in curd production, two separate tofu curds were made for each cultivar from each growing environment. Textural analysis of tofu Measurements of tofu firmness were made from the formed curd within the jar with a Texture Analyzer Measuring System (Model TA.XT2, Texture Technologies Corp., Scarsdale, NY/Stable Micro Systems, Godalming, Surrey, England) employing a penetration test with a spherical stainless probe with a diameter of 1.905 cm (TA 18 A 3/4" dia ball probe; Figure 2.1) (Mullin et al., 2001). The test speed was 1.00 mm/s, with 200 data points collected per second. The trigger force to detect sample surface is 5 gf. For each measurement with the Texture Analyzer Measuring System, force was plotted against time. Textural traits calculated from this plot included work to break, 22 brittleness, stiffness, and gel strength. Gel strength, sometimes referred to as hardness, was defined as the peak value of the force applied at the breaking point of the tofu sample (b in Figure 2.2). Brittleness is the distance from initial contact to the break point (c ×1.00 mm/s in Figure 2.2). Stiffness, sometimes referred to as firmness, is the slope from the initial contact point to the break point ([(b - a) / c] in Figure 2.2). Work to break is the definite integral, or area under the curve, from the initial contact point (0, a) to the break point (c, b) (Figure 2.2). Statistical Analyses In order to obtain the phenotypic value for each sample (48 cultivars grown in two locations), measurements of tofu texture and seeds were analyzed in SAS (SAS Institute Inc., Cary, NC) by calculating least-squares means (LSmeans) using PROC MIXED. The model for LSmeans was: Yik = μ + Gi(Lk) + Lk + εik, where Yik is the observed value for a given trait, μ is the overall mean, Gi(Lk) is the ith cultivar or breeding line in the kth environment, representing each sample effect; Lk is the effect of the kth environment. Gi(Lk) is treated as fixed effect; Lk is a random effect. Using LSmeans, Pearson’s correlation coefficients were calculated between tofu textural traits; between eight seed traits and four tofu textural traits; and also between seed protein content and oil content. Significance levels of correlations were corrected for the 39 multiple comparisons by Bonferroni’s method. Histograms of LSmeans for all traits and scatter plots for all pairwise LSmeans comparisons were obtained using R. Coefficient of variation for tofu textural traits was calculated in R. 23 In order to address the questions of heritability and the value of indirect selection for improvement of tofu texture, the data was further decomposed into the specific sources of variance and best linear unbiased prediction (BLUP) values calculated with PROC MIXED in SAS using the model Yijkl = μ + Gi + Lk + Tl(Lk) + Lk × Gi + εijkl, where μ represents the ground mean, Gi is the effect of the ith cultivar or breeding line, Lk is the effect of the kth environment, Tl(Lk) is the effect of the lth trial within the kth environment, and Lk × Gi is the interaction effect of the kth environment with the ith cultivar or breeding line, εijkl is the error associated with the observation. Pearson’s correlation coefficients between BLUP values were conducted in R. Significance levels of correlations were corrected for the 24 multiple comparisons by Bonferroni’s method. Variance components were estimated using the REML method. For tofu texture traits, the proportion of genetic variance to total observed variance was calculated with the combined data set for both environments. RESULTS Seed measurements Seed measurements collected for each of 48 cultivars or breeding lines included seed weight, volume, density, straight length, straight width, and the length to width ratio as well as protein and oil content. Q-Q plots and Anderson Darling normality tests were conducted as an exploratory step (results not shown); with the exception of seed density 24 and seed oil, LSmeans for all traits are significantly different from a normal distribution, pictorially displayed in Figure 2.3. LSmeans for density ranged from 1.00 to 1.47 g·cm-3 and for the seed oil content ranged from 16.85 to 21.73% (Figure 2.3). LSmeans for other seed traits with non-normal distributions ranged from 11.12 to 21.18 g for seed weight, 0.092 to 0.179 cm3 for volume, 5.81 to 8.45 mm for straight length, 5.22 to 7.17 mm for straight width, 1.06 to 1.33 for length to width ratio, and 37.89 to 47.46% for protein content (Figure 2.3). There was a significant negative correlation between the LSmeans for seed protein and oil content (r= -0.82, adjusted p = 8.6 x 10-15; Figure 2.4). Textural qualities of tofu The tofu texture was studied for curds produced from the same set of 48 breeding lines or cultivars included in this experiment. The textural measurements included work to break, brittleness, stiffness, and gel strength. LSmeans were calculated for each sample (breeding line or cultivar from each environment). For all four tofu texture traits, the LSmeans were normally distributed according to Anderson Darling normality tests (results not shown) with ranges from 303.36 to 1284.18 gf·s for work to break, 9.18 to 14.56 gf·s-1 for stiffness, 5.37 to 16.48 mm for brittleness, and 4.19 to 5.44 gf for gel strength (Figure 2.3). With the exception of brittleness which is not significantly correlated with stiffness, the other textural traits are highly correlated with each other (Figure 2.5; Table 2.3). To estimate the genotypic value of a trait for each breeding line or cultivar, a general linear model which included the genotype by environment interactions was applied to 25 combined data from both locations. The BLUP values for all tofu textural traits lacked significant genetic variance and possessed significant genotype by environment interaction effects (Table 2.2). The proportion of total variance explained by genetic variance for work to break, stiffness, and gel strength are low (0.10 to 0.17), indicating only a minimal possibility for improvement through direct phenotypic selection. Correlations between textural traits of tofu and seed measurements The correlations between textural traits of tofu and the seed measurements were calculated both using LSmeans and BLUPs values. LSmeans for tofu textural traits and seed traits were correlated to explore the physical relationship between seed traits and tofu texture (Table 2.4). Scatter plots for the LSmeans correlations are displayed in Figure 2.6. LSmeans for work to break, stiffness, and gel strength of tofu are significantly positively correlated with LSmeans for seed protein content (adjusted p = 1.4 x 10-3, 7.8 x 10-6, 5.7 x 10-4, respectively; Table 2.4). LSmeans for stiffness and gel strength were significantly negatively correlated with seed oil content (adjusted p = 1.7 x 10-3, 3.8 x 10-2, respectively; Figure 2.6, Table 2.4). Other tofu traits were not detected to be significantly or highly correlated with any of the seed size or seed shape traits (Figure 2.6 and Table 2.4). BLUP values for tofu textural traits and seed traits were correlated to determine the feasibility of indirect selection for improvement of tofu texture (Table 2.4). No BLUP values were calculated for brittleness which had an estimated genetic variance of 0 (Table 2.3). BLUP values for work to break, stiffness, and gel strength of tofu are significantly 26 positively correlated with BLUP values for seed protein content (adjusted p = 2.4x 10-3, 1.6 x 10-3, 3.8 x 10-3, respectively; Table 2.4). BLUP values for stiffness and gel strength had a significant negative correlation with BLUP values for seed oil content (adjusted p = 9.3x 10-3, 2.6 x 10-2, respectively; Table 2.4). DISCUSSION Except that brittleness calculated in this study is not significantly correlated to stiffness, each of the tofu textural traits is highly correlated with the other textural traits. This is largely due to the mathematical relationship among the measurements, with force, time, and distance to the fracture point being the critical numbers (Figure 2.2). As examined from two locations combined, work to break, stiffness, and gel strength, exhibit non-significant genetic variance. While few studies have examined the genetic variance in textural traits of tofu; significant genetic variance of the gel strength (hardness) has been reported (Mullin et al., 2001; Poysa et al., 2006). Mullin et al. (2001) reported significant genetic variance of gel strength by compression and penetration and of firmness by penetration based on four soybean varieties grown in a single environment. In a study with a series of glycinin and β-conglycinin subunits null lines in the ‘Harovinton’ (Buzzell et al., 1991) genetic background, Poysa et al (2006) reported significant genetic variance of gel strength and firmness by compression and penetration among genotypes grown in two environments and across two years. 27 It has frequently been reported that the firmness and/or gel strength of tofu are positively correlated with the protein content of soymilk, which is highly correlated to soybean seeds’ protein (Lim et al., 1990; Shen et al., 1991; Cai et al., 1997; Poysa and Woodrow, 2002; Shih et al., 2002). This finding has been further verified in the present study, where stiffness and hardness were significantly correlated to seed protein content. In confirmation of results reported by Poysa and Woodrow (2002), these textural qualities of tofu are also negatively correlated to seed oil content. This is not unexpected as there is a long established negative correlation between oil and protein content in soybean seed (Johnson and Bernard, 1962) with correlation coefficients ranging from 0.18 to -0.62 (Yaklich et al., 2002). This study reported a similar negative correlation of 0.82 between seed oil and protein content. The positive correlation between seed protein content and work to break has not been reported before; however, it is expected that work to break will follow trends similar to tofu stiffness and gel strength because, as discussed above, these measurements are highly correlated and dependent on the force and distance required to reach the fracture point. The significant correlation of BLUPs values between seed protein content and other tofu texture traits indicate that selection of higher protein in soybean seeds will result in greater firmness, gel strength and stiffness in tofu. Given the large amount of time and labor involved and the relatively large amount of seed required for tofu production, it follows that high protein content should be and is a breeding goal (Panthee et al., 2005) in food-grade cultivar development. 28 We report that brittleness exhibits no significant genetic variance. While significant differences for brittleness as measured by compression have been reported with the use of different coagulants (Hou et al., 1997), in concordance with the present study, it has also previously been reported that there was little to no variation in brittleness measurements from tofu produced from three different cultivars (Shih et al., 2002). In addition to high protein, large and round seed are important characteristics considered by tofu producers (Poysa et al., 2002; Salas et al., 2006); however, this study did not detect evidence that seed size and shape significantly affect the textural quality of the tofu. It should be noted that the present study looked at only one tofu production system and did not measure the yield of soymilk or tofu. While seed size and shape may have importance in other production methods, to soymilk or tofu yield, or have sociological importance in the food-grade soybean market, selection for these traits is unlikely to improve the textural quality of tofu as measured in this study. Work to break, stiffness, and gel strength all presented a low proportion of genetic variance contributing to total variance, indicating that improvement in textural quality of tofu could be limited through direct selection on these traits and methods of indirect selection should be further explored. Credits: Seed protein and oil measurements were conducted by NCAUR. All other work was conducted by Mao Huang with assistance from Amanda Gutek with making tofu samples, and assistance from additional members of the McHale lab with seed shape and 29 size measurements. Dr. Kaletunc assisted with textural measurements. Dr. Steve St. Martin assisted with statistical data analysis. 30 TABLES AND FIGURES Line or cv. name Dennison ‡ HC95-15MB‡ HC95-24MB IA3024‡ IA3027 ‡ LD00-3309 ‡ Ohio FG1 Ohio FG3 Ohio FG4 Ohio FG5 Prohio ‡ Williams‡ Wyandot H05-221 H09-106 ‡ H09-121‡ H09-179 H09-260‡ H09-264 H09-277 ‡ H09-4 ‡ H09-6‡ H09-9 HS5-3519 HS7W-190 HS7W-191 HS7W-94 ‡ HS8-3284‡ HS8-3331 ‡ HS8-3451 HS8-3538 HS8-3713 HS8W-102 HS8W-103 HS8W-178 HS8W-179 HS8W-514 HS8W-517 HS8W-520 HS8W-521 Pedigree ‘Athow’ x HS94-4533 ‘Maverick' x 'Dwight' LS301 x HS84-6247 HS89-8843 x Ohio FG1 ‘Ohio FG1’ x HS89-3078 HC94-81PR x Asgrow2506 ‘Wayne' x L57-0034 OXR-96243 x Ohio FG1 HCS95-15MB x Williams SF114 x Wyandot SF114 x Ohio FG5 SF114 x Ohio FG5 SF114 x Ohio FG5 SF114 x Ohio FG5 SF114 x Ohio FG5 SF114 x Wyandot SF114 x Wyandot SF114 x Wyandot HS99-4577 x PI133293 HS1-3641 x HS1-3907 HS1-3641 x HS1-3907 HS1-5870 x Ohio FG4 ‘Kottman’ x HS1-7267 HS3-25233 x (Williams x PI 424354) HS3-25233 x (Williams x PI 424354) Dennison2 x HFPR4 C2033 x HS1-423 HS2-8086 x HS1-7267 HS2-8086 x HS1-7267 HS1-424 x Ohio FG4 HS1-424 x Ohio FG4 Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A Reference† St. Martin et al., 2008 Cooper & Hammond, 1999 Cooper & Hammond, 1999 Iowa State Univ. Iowa State Univ. Diers et al., 2006 St. Martin et al., 1994 St. Martin et al., 2004 OSU-OARDC St. Martin et al., 2006 Mian et al., 2008 Bernard & Lindhal, 1972 OSU-OARDC Continued Table 2.1. Phenotyped cultivars and breeding lines. † A reference is provided for released cultivars. ‡ Phenotypic data were only included from one location (either Wooster or Hoytville). 31 Table 2.1 continued OHS 306 M09-W033 M09-W039 M09-W043 M09-W095 M09-W104 M09-W105 Wyandot Rg2 Trait Work to break Brittleness Stiffness Gel Strength HS98-3409 x HS99-5021 OHS 304 x LG00-3372 HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison Wyandot x PI 243540 σ2G σ2G×E ―― P-value ―― 0.18 0.0085‡ n.e.§ 0.087 0.11 0.0024 0.11 0.0032 OSU-OARDC σ2G/ σ2T CV%† 0.10 0 0.16 0.17 37.02 14.73 29.76 6.89 Table 2.2. Variance for tofu textural traits. The proportion of the observed total variance (σ2T) explained by genetic variance (σ2 G) for BLUPs values of tofu textural traits and the coefficient of variation (CV%) are presented. Significance levels of genetic variance (σ2G) and genotype × environment variance (σ2G×E) were determined by Wald z-test in SAS. † CV% is calculated across all samples for each trait with raw data from each genotype and location. ‡ Significant values (α = 0.05) for σ2G and σ2G×E are italicized for emphasis. §Not estimable. Brittleness Stiffness Gel strength 0.84*** 0.93*** Area 0.53*** 0.060 0.36** Brittleness 0.93*** Stiffness Table 2.3. Pearson’s correlation coefficient between textural traits of tofu. Correlation analysis conducted based on LSmeans of each trait. Significance of Pearson’s correlation coefficient is tested by t-test and corrected for multiple comparisons using Bonferonni’s method; indicated by * for adjusted p ≤ 0.05, ** for adjusted p ≤ 0.01 level, and *** for adjusted p ≤ 0.001. 32 Trait a. Work to break Brittleness Stiffness Gel Strength b. Work to break Stiffness Gel Strength Weight Volume Density Straight Straight Length Protein length width :width 0.065 -0.0067 0.081 0.072 0.065 0.084 0.025 0.050 0.0010 -0.21 0.14 0.059 -0.058 0.25 -0.17 -0.043 0.033 0.25 -0.088 0.0093 0.014 0.072 0.061 -0.022 -0.0077 0.017 0.12 0.18 0.11 -0.069 -0.084 -0.019 -0.008 -0.25 -0.059 -0.25 0.0054 -0.18 -0.17 0.091 -0.20 -0.14 Oil 0.45** 0.0049 0.55*** 0.47*** -0.34 -0.027 -0.44** -0.36* 0.53** 0.60*** 0.52** -0.46* -0.49* -0.41 Table 2.4. Pearson’s correlation coefficient between tofu textural measurements (left side) and seed measurements (top). Significance levels are as described in Table 2.3. (a) Correlation analysis conducted based on LSmeans of each trait. (b) Correlation analysis conducted based on BLUP values. Figure 2.1. Workflow of tofu production. Methods adapted from Liu (1997), Evans et al. (1997), and Mullin et al. (2001). 33 Figure 2.2. Force-deformation plot of tofu penetration analysis. The zero time point is equal to the initial contact of the probe to the surface of the tofu. Dashed line indicates break point, (a) force at the initial contact point, (b) peak force at break point (c) distance from initial contact to breakpoint. 34 Seed protein content (%) Figure 2.3. Histograms of LSmeans for (a) tofu texture traits and (b) seed traits. The yaxis represents the count of breeding lines and cultivars at each location. The mean value for each trait is illustrated with a vertical red line. Seed oil content (%) Figure 2.4. Scatter plots comparing LSmeans of protein and oil content. 35 Work to break (gf∙s) Brittleness (mm) Stiffness (gf∙s-1 ) Gel strength (gf) Figure 2.5. Scatter plots comparing LSmeans of tofu textural traits. Figure 2.6. Scatter plots comparing LSmeans of tofu textural traits (y-axis) versus seed traits (x-axis). 36 REFERENCES Bernard RL, Lindahl DA (1972) Registration of Williams Soybean (Reg. No. 94). Crop Science. 12: 716. Buzzell RI, Anderson TR, Hamill AS, Welacky TW (1991) Harovinton soybean. Canadian Journal of Plant Science. 71: 525-526. Cai TD, Chang KC (1998) Characteristics of production-scale tofu as affected by soymilk coagulation method: propeller blade size, mixing time and coagulant concentration. Food Research International. 31(4): 289-295. Cai TD, Chang KC (1999) Processing effect on soybean storage proteins and their relationship with tofu quality. Journal of Agricultural and Food Chemistry. 47(2): 720-727. Cai TD, Chang KC, Shih MC, Hou HJ, Ji M (1997) Comparison of bench and production scale methods for making soymilk and tofu from 13 soybean varieties. Food Research International. 30(9): 659-668. Chianu JN, Zegeye EW, Nkonya E M (2010) Global Soybean Marketing and Trade: a Situation and Outlook Analysis. In: The Soybean: Botany, Production and Uses. G. Singh., Ed. CAB International: Wallingford, England. Cooper RL, Hammond RB (1999) Registration of Insect-Resistant Soybean Germplasm Lines HC95-24MB and HC95-15MB. Crop Science. 39: 599. Diers BW, Cary TR, Thomas DJ, Nickell CD (2006) Registration of ‘LD00-3309’ soybean. Crop Science. 46:1384. Dimitri C, Greene C (2002) Recent growth patterns in the US organic foods market. Agriculture Information Bulletin. 777. Evans DE, Tsukamoto C, Nielson NC (1997) A small scale method for the production of soymilk and silken tofu. Crop Science. 37: 1463-1471. Gandhi AP, Bourne MC (1988) Effect of pressure and storage time on texture profile parameters of soybean curd (tofu). Journal of Texture Studies. 19: 137-142. Golbitz P, Jordan J (2006) Soyfoods: Market and Products. In: Soy Applications in Food. Riaz, M. N., Ed. Taylor & Francis: Boca Raton, FL. 37 Graef GL, Specht JE (1989) Fitting the niche food grade soybean production: a new opportunity for Nebraska soybean producers. Nebraska Department of Agriculture, Lincoln, pp 18–27. Griffis G, Wiedermann L (1990) Marketing food-quality soybeans in Japan, 3rd edn. American Soybean Association, St. Louis. Hou HJ, Chang KC, Shih MC (1997) Yield and textural properties of soft tofu as affected by coagulation method. Journal of Food Science. 62(4): 824-827. Johnson HW, Bernard RL (1962) Soybean genetics and breeding. Advances in Agronomy. 14: 149-221. Johnson LD, Wilson LA (1984) Influence of soybean variety and the method of processing in tofu manufacturing: comparison of methods for measuring soluble solids in soymilk. Journal of Food Science. 49(1): 202-204. Kumar V, Rani A, Solanki S, Hussain SM (2006) Influence of growing environment on the biochemical composition and physical characteristics of soybean seed. Journal of Food Composition and Analysis. 19(2): 188-195. Kwan SW, Easa AM (2003) Comparing physical properties of retort-resistant glucono- δlactone tofu treated with commercial transglutaminase enzyme or low levels of glucose. LWT-Food Science and Technology. 36(6): 643-646. Lim BT, DeMan JM, DeMan L, Buzzel RI (1990) Yield and quality of tofu as affected by soybean and soymilk characteristics, calcium sulfate coagulant. Journal of Food Science. 55(4): 1088-1107. Liu KS (1997) Soybeans: chemistry, technology, and utilization. Chapman & Hall: New York. Liu ZS, Chang SKC (2004) Effect of soy milk characteristics and cooking conditions on coagulant requirements for making filled tofu. Journal of Agricultural and Food Chemistry. 52(11): 3405-3411. Mian MAR, Cooper RL, Dorrance AE (2008) Registration of ‘Prohio’ soybean. Journal of Plant Registrations. 2: 208-210. Min S, Yu Y, Martin SS (2005) Effect of soybean varieties and growing locations on the physical and chemical properties of soymilk and tofu. Journal of Food Science. 70(1): C8-C21. Mujoo R, Trinh DT, Ng PK (2003) Characterization of storage proteins in different soybean varieties and their relationship to tofu yield and texture. Food Chemistry. 82(2): 265-273. 38 Mullin WJ, Fregeau-Reid JA, Butler M, Poysa V, Woodrow L, Jessop DB, Raymond D (2001) An interlaboratory test of a procedure to assess soybean quality for soymilk and tofu production. Food Research International. 34: 669-677. Mullin WJ, Xu W (2001) Study of soybean seed coat components and their relationship to water absorption. Journal of Agricultural and Food Chemistry. 49(11): 5331-5335. Poysa V, Woodrow L (2002) Stability of soybean seed composition and its effect on soymilk and tofu yield and quality. Food Research International. 35: 337-345. Poysa V, Woodrow L, Yu K (2006) Effect of soy protein subunit composition on tofu quality. Food Research International. 39(3): 309-317. Shen CF, De Man L, Buzzell RI, De Man JM (1991) Yield and Quality of tofu as affected by soybean and soymilk characteristics: Glucono-delta-lactone coagulant. Journal of Food Science. 56(1): 109-112. Shih MC, Hou HJ, Chang KC (1997) Process optimization for soft tofu. Journal of Food Science. 62(4):833-837. Shih MC, Yang KT, Kuo SJ (2002) Quality and antioxidative activity of black soybean tofu as affected by bean cultivar. 67(2): 480-484. St. Martin SK, Calip-DuBois AJ, Fioritto RJ, Schmitthenner AF, Min DB, Yang T-S, Yu YM, Cooper RL, Martin RJ (1996) Registration of ‘Ohio FG1’ Soybean. Crop Science. 26: 813. St. Martin SK, Feller MK, Fioritto MJ, McIntyre SA, Dorrance AE, Berry SA, Sneller CH (2006) Registration of ‘HS0–3243’ Soybean. Crop Science. 46:1811. St. Martin SK, Mills GR, Fioritto RJ, McIntyre SA, Dorrance AE, Berry SA (2006) Registration of ‘Ohio FG5’Soybean. Crop science. 46(6): 2709-2709. St. Martin SK, Mills GR, Fioritto RJ, McIntyre SA, Dorrance AE, Cooper RL (2004) Registration of ‘Ohio FG3’ soybean. Crop Science. 44: 687. St. Martin SK, Feller MK, McIntyre SA, Fioritto RJ, Dorrance AE, Berry SA, Sneller CH (2008) Registration of ‘Dennison’ Soybean. Journal of Plant Registrations 2: 21. Sun N, Breene WM (1991) Calcium sulfate concentration influence on yield and quality of tofu from five soybean varieties. Journal of Food Science. 56(6): 1604-1607. Wang CCR, Chang SKC (1995) Physiochemical properties and tofu quality of soybean cultivar Proto. Journal of Agricultural Food Chemistry. 43: 3029-3034. 39 Yaklich RW, Vinyard B, Camp M, Douglass S (2002) Analysis of seed protein and oil from soybean northern and southern region uniform tests. Crop Science. 42(5): 15041515. Yamaura, K (2011) Market power of the Japanese non-GM soybean import market: The US exporters vs. Japanese importers. Asian Journal of Agriculture and Rural Development. 1(2): 80-89. Yasir SBM, Sutton KH, Newberry MP, Andrews NR, Gerrard JA (2007) The impacts of transglutaminase on soy proteins and tofu texture. Food Chemistry. 104: 1491-1501. Yuan S, Chang SKC (2007) Texture profile of tofu as affected by Instron parameters and sample preparation, and correlations of Instron hardness and springiness with sensory scores. Journal of Food Science. 72(2): S136-S145. 40 CHAPTER 3 ANALYSIS OF POPULATION STRUCTURE IN A SOYBEAN BREEDING PROGRAM OF COMMODITY AND SPECIALTY TYPES Abstract Many soybean breeding programs are now focusing on the development of specialty soybean types for niche markets. These specialty soybean cultivars require modified fatty acid profiles for the vegetable oil market or high protein, large seed size, and clear hilum for the food-grade soybean market. The selection for different specialty types within a single breeding program may result in the structuring or genetic differentiation of the population. Population structure and the phenotypes important to commodity, modified fatty acid, and food-grade soybean markets were evaluated in a breeding population of 242 lines comprised of both commodity and specialty type soybeans. Two subpopulations associated with the phenotypes of specialty soybean types were identified on the basis of 504 single nucleotide polymorphism markers assayed on the breeding population. Our results indicate that selection of parents as well as the directional selection among progeny in a single selection event can both contribute to the structuring of this breeding population. 41 INTRODUCTION Increasing soybean yields has been an important breeding goal (Kim et al., 2012). In 2011, more than 9.2 billion bushels of soybeans were produced worldwide, with 33% of that coming from the United States (The American Soybean Association, 2012). Over the past 80 years, soybean breeding programs in North America have been successful in increasing yields, with genetic gains in soybean yield across maturity groups and regions ranging from 0.87% to 3.49% (Lange and Fedrizzi, 2009). In addition to increasing yield, soybean breeding programs have further diversified their goals with recent emphasis being placed on various specialty traits. This diversification of breeding goals may result in genetic differentiation, or population structure, within a breeding population. Understanding population structure can be important in efficiently introducing and utilizing genetic diversity for a breeding program (Glaszmann et al., 2010). Being the world’s major seed crop for vegetable oil production, modified fatty acid composition of soybean oil has become an important trait for which specialty soybean types have been selected and developed (Lee et al., 2007). In particular, high oleic acid, low saturated fatty acids, and low linolenic acid content are important for human health and oil stability (Miller et al., 1987; Grundy et al., 1988). Mutations in specific genes in the fatty acid biosynthesis pathway have been exploited for the development of specialty soybeans with modified oil profiles (e.g., Burton et al., 2004; Burton et al., 2006; Shannon et al., 2005). The genetic background and loci of small effect have also been shown to be important to the expression of the modified fatty acid traits. Thus, selection 42 for specific modified fatty acid profiles can involve several to many loci (Burton et al., 2006; Panthee et al., 2006; Han et al., 2011). High seed protein content is an important specialty trait selected for in soybean breeding programs. Soybeans contribute to more than 70% of the protein consumed by humans; its utilization as a protein source is demonstrated in its widespread use as a feed ingredient for livestock and poultry production (Krishnan, 2005). Additionally, high protein content is one of the desired traits in food-grade soybeans, soybeans produced specifically for tofu and soymilk (Wang et al., 1983). As a result of an increased demand for soy foods, the development of food-grade soybean cultivars is becoming an important focus of many breeding programs (Poysa et al., 2006). Food-grade soybeans have a suite of requisite characters, which include clear hilum, large seed size, and a high protein content, requiring selection at numerous loci for improvement of this set of traits (Wang et al., 1983). Within a given breeding program, parental and progeny selection for these specialty traits may result in structuring of the breeding population and the creation of genetically distinct subpopulations. Previous studies on soybean population structure have been conducted in regional populations of cultivated and wild soybeans collected from Japan, China, and Korea (Hirata et al., 1999; Yan et al., 2003; Kuroda et al., 2006) or populations of one specialty soybean type from multiple regions (Shi et al., 2010). These studies have shown the presence of subpopulations associated with the geographic origin of the soybean genotypes. However, these studies do not address population structure within a single breeding program or between soybean specialty types. As a result of 43 selection for different specialty types, a population of soybean lines from a single breeding program may exhibit genetic differentiation resulting in subpopulations. Identifying population structure is a requisite first step in conducting linkage disequilibrium mapping, a useful tool in understanding and utilizing the genetic diversity within a breeding program (Cardon and Palmer, 2003). The software STRUCTURE has been useful in objectively detecting population structure using genotypic data (Falush et al., 2003). STRUCTURE implements a Bayesian model-based clustering method. Compared to purely distance-based methods, it overcomes the somewhat arbitrary selection of a distance threshold or selection of clusters “by-eye” from a graphical display (Pritchard et al., 2000). Through the inclusion of a vector representing the individuals’ admixture proportion in the model, the methods implemented in STRUCTURE are also able to estimate the proportion of membership to a population for each individual. In this study, population structure is examined in a collection of 242 lines from a breeding program which aims to develop both food-grade and modified fatty acid specialty soybean cultivars, as well as commodity (high yield) cultivars (for examples: St. Martin et al., 1996; St. Martin et al., 2004; St. Martin et al., 2006a; St. Martin et al., 2006b; McHale et al., 2012). The objectives were to 1) determine if genotypic data provides evidence of subpopulations within a soybean breeding program for both specialty and commodity soybean cultivars, 2) explore the relationship of subpopulations with phenotypic data, and 3) determine whether population structuring is related to pedigree and/or progeny selection. The results indicate that there are two distinct 44 subpopulations within this breeding population. One of the subpopulations consists primarily of lines with food-grade cultivars in their pedigree. The observed phenotypic differences are associated with these genotypically determined subpopulations. The data also supports that the detected genetic substructure is associated with the pedigree history and, to a lesser extent, directional selection on progeny occurring from a single selection event. MATERIALS AND METHODS Plant Populations and DNA Isolation The population, grown in 2010, consisted of 242 F4 lines derived from crosses between numerous maturity group (MG) II and III public varieties and plant introductions with additional cultivars used as trait, maturity, and yield checks (Table 3.1). Lines were grown in separate yield tests for early or late maturity and advanced or preliminary lines dependent on their maturity group (MGII or MGIII) and their generation (F4:7-9 or F4:6), respectively. Lines for each test were planted in a randomized complete block design with two replicates for the preliminary lines (F4:6) or three replicates for the advanced lines (F4:7-9) at four Ohio locations, South Charleston, Plain City, Hoytville, and Wooster. The early maturing advanced lines (ALTA), the late maturing advanced lines (ALTB), and the large seeded advanced lines (LST) were planted in six row plots at four locations with a plot size of 4 m x 2.5 m in South Charleston and Plain City, and 4.9 m x 3 m in 45 Hoytville and Wooster. The preliminary tests for the early maturing lines (OPTA1, OPTA2) and the late maturing lines (OPTB) were planted at three sites (South Charleston, Hoytville, and Wooster). In South Charleston, OPT tests were planted in three row plots with a plot size of 4 m x 1.3 m. In Hoytville and Wooster, tests were planted in two row plots with a plot size of 4.9 m x 0.6 m. Seeds were harvested from each location during the fall of 2010. Check cultivars were included in multiple yield tests and used to normalize data. For each line, leaf tissue from the first two true leaves of nine seedlings was collected in liquid nitrogen and lyophilized. DNA was extracted from each line following a protocol from Keim et al. (1988) adapted for use in a 96-well plate. Breeding lines were placed into three groups on the basis of their parental pedigree: commodity, modified fatty acid, or food-grade/high protein. Individuals with at least one parent with modified fatty acid or food-grade characteristics (high protein or large seed with clear or yellow hilum) were placed into the pedigree-based modified fatty acid group or the food-grade/high protein group, respectively. All other individuals were placed in the commodity group. Individuals were also placed into groups on the basis of observed phenotypes. Individuals with low combined saturated fatty acids, high oleic acid, and/or low linolenic acid were placed into the phenotype-based modified fatty acid group. Individuals with high seed protein content and/or large seed size with a clear hilum were placed into the phenotype-based food-grade/high protein group. Individuals with yields exceeding the check cultivars were selected for the commodity group. These sub-groupings correspond 46 to the individuals which would be selected for continuation in the breeding program. All other individuals were placed into the non-selected group. Collection of Genotypic Data Ninety six individuals with representative lines from public breeding programs in Ohio and parents of mapping populations were genotyped with the Universal Soy Linkage Panel 1.0 comprised of 1,536 SNP markers (Hyten et al., 2010) using the GoldenGate assay (Illumina Inc, San Diego, CA). From this screen for polymorphic markers, an initial set of 384 markers was selected with a high polymorphism information content (PIC) and even genome distribution. An additional set of 384 markers were added in order to fill in gaps between markers in the first set (Hyten et al., 2010). These 768 markers were assayed on all 242 lines and genotypes were assigned using the GenomeStudio software (Illumina Inc.). Uninformative markers were removed, including markers which were monomorphic, consisted of poor data with > 24% missing scores, markers with a minor allele frequency < 3.05% for the first set of markers, or markers with a minor allele frequency < 0.4% for the second set of markers. Collection of Phenotypic Data Seed oil and protein concentrations and fatty acid composition were collected from all lines at each location and each block replicate. Seed oil and protein concentration was measured at the National Center for Agricultural Utilization Research (NCAUR) in Peoria, IL using Near Infrared (NIR) technology with an Infratec 1255 Food 47 & Feed Analyzer (UltraTec Manufacturing Inc., Santa Ana, CA). Fatty acid composition data were collected by gas chromatography of fatty acid methyl esters at NCAUR using an Agilent Technologies 6890 GC equipped with an autosampler (Agilent Technologies, Santa Clara, CA). Seed weight, volume, density, and shape measurements were collected from 242 lines at each location with a single replicate. One-hundred seed weight was measured by scale (OHAUS Adventurer TM Pro, Model AV212). Seed volume was measured by water displacement of 50 dry seeds in 25 ml water in a 50 ml graduated cylinder. Displaced volume was divided by 50 to obtain the volume per seed. Seed density data calculated as seed weight divided by seed volume. Seed straight length, straight width and length to width ratio measurements were made via scanning of seeds and subsequent image analysis with WinSEEDLE software (Regent Instruments Inc., Canada). Statistical Analysis of Phenotypic Data For each trait the genotypic effect of each individual was estimated with Best Linear Unbiased Prediction (BLUP) values calculated with SAS software v. 9.2 (SAS Institute Inc., Cary, NC). The model used was: Yijkl = μ + Cj + Gi(Cj) + Lk + Tl(Lk) + Lk × Gi(Cj) + εijkl, where Yijkl is the observed value for a give trait, μ is the overall mean, Cj is the effect of the jth class of cultivar or breeding line in which j is equal to a number one to nineteen for the check cultivars and twenty for all experimental breeding lines, Gi(Cj) is the effect of the ith check or experimental breeding line within jth class, Lk is the effect of the kth location, Tl(Lk) is the effect of the lth trial within the kth location, Lk × Gi(Cj) is the 48 interaction effect of the kth location with the ith cultivar or breeding line within a class and εijkl is the error associated with the observation. All effects are treated as random except Cj is fixed. Analysis of Population Substructure The software STRUCTURE was used to cluster the population into subpopulations and to generate a Q-matrix indicating each individual’s proportion membership to a subpopulation (Qw; Pritchard et al., 2000). Starting with a small Burn-in (100,000) and Markov chain Monte Carlo (MCMC) (100,000), the number of subpopulations (K) was advanced from K = 1 to K = 15 with 20 replications. StrAuto was used to automate STRUCTURE runs (Chhatre, 2012). The output of STRUCTURE runs were summarized using the Evanno method via STRUCTURE HARVESTER (Dent and Bridgett, 2012) in which the true K is selected as the peak of ∆K, which is mean(|L’’(K)|)/s[L(K)] where L”(K) is the second order rate of change of the log likelihood of K (L(K)) divided by standard deviation of L(K) (Evanno et al., 2005). An increased Burn-in (500,000) and MCMC (500,000) with K fixed at 2 was carried out to determine population membership for each line. The number of subpopulations was further confirmed and the population structure further explored by conducting Principal Component Analysis (PCA) in R software. For PCA of genotypic data, the imputed marker data file consisting of 242 breeding lines or cultivars and 504 markers was used. Markers were scored as: “2” for the homozygous common allele, “0” for the homozygous rare allele, and “1” for heterozygote. For the PCA of phenotypic data, all 242 breeding 49 lines or cultivars were included and a total of 14 traits being assessed, including yield, seed protein and oil content, oil palmitic, stearic, oleic, linoleic and linolenic content, and seed length, width, length to width ratio, weight, volume, and density. For both genotypic and phenotypic PCA, a correlation matrix was used and analysis was conducted in R using prcomp() command. A scree diagram plotting eigenvalues of each component against the component number was examined to assist in selecting an appropriate number of PCs (results not shown). Three PCs were chosen and plotted for genotypic data and two PCs were chosen and plotted for phenotypic data. RESULTS Genotypic data A total of 504 markers were included in the genotypic analysis. According to the soybean consensus map v4.0 (Hyten et al., 2010), markers were evenly distributed across all chromosomes with the average gap distance 4.3 cM. The largest gap distance between markers is 34.2 cM. Population structure The data set consisting of 504 markers assayed on 242 individuals was used to estimate population structure using the Bayesian model-based methods implemented in STRUCTURE. To obtain an estimate of the best fitting number of clusters (K), values of 50 K ranging from 1 to 15 were tested using the ΔK method to reveal a peak of ΔK at K = 2 (Figure 3.1) (Evanno et al., 2005). However, as it is not possible to plot ΔK while K = 1, further investigation into whether K was equal to 1 or 2 was required. Clustering of individuals based on genotypic data was observed from PCA. The first three principal components, describing 19.9% of the genotypic variance, indicate that the individuals do loosely cluster into two subpopulations as defined by STRUCTURE at K = 2 (Figures 3.2 and 3.3). The two subpopulations defined by STRUCTURE are hereafter referred to as the major and minor subpopulations where the major subpopulation represents the larger of the two subpopulations with 65% of the alleles in the full population attributed to this subpopulation (Figure 3.2a). The majority of the lines exhibited high levels of admixture (Figure 3.2). For this study, an admixed individual was defined as an individual having less than 90% of the alleles attributed to a single population. By this criteria, over half (50.2%) of the lines in the population were admixed (Figure 3.2). Eighty-five of 242 lines (35.1%) belonged primarily to the major subpopulation with little to no admixture, represented as more than 90% of the alleles of a single line attributed to the major subpopulation (Figure 3.2). In contrast, only 34 of 242 lines (14%) were members of the minor subpopulation with little or no admixture (Figure 3.2). Effect of pedigree on population structure The pedigree-based food-grade/high protein group had 94% of their alleles attributed to the minor subpopulation (Figure 3.2a) and was significantly different from all other 51 groups (Figure 3.4). In contrast to the food-grade/high protein group, the allele distributions of the pedigree-based modified fatty acid and commodity groups were not significantly different from each other nor from the allele distribution of the entire population (Figure 3.4) Effect of phenotypic selection on population structure The phenotype-based groups represent how lines would be selected in a breeding program as compared to the pedigree-based group. The alleles represented in the phenotype-based food-grade/high protein group become less dominated by the minor subpopulation alleles and closer to the mean subpopulation membership observed in the population as a whole (Figures 3.2 and 3.4). In concordance with the mean subpopulation membership for the pedigree and phenotype-based food-grade/high protein groups, the genotypic PCs also differentiated the pedigree-based food-grade/high protein group from the entire population more than these PCs differentiated the phenotype-based foodgrade/high protein group from the entire population (Table 3.2). Ten lines from the pedigree-based commodity group (lacking a food-grade/high protein line in the pedigrees) were categorized in the food-grade/high protein group based on phenotype (Table 3.1). This “reclassification” of lines from the pedigree-based commodity group to the phenotype-based food-grade/high protein group contributed to the mean subpopulation membership of this group becoming closer to the mean subpopulation membership of the entire population. 52 In contrast to the effects of selection on the food-grade/high protein groups, the mean subpopulation membership of the phenotype-based modified fatty acid group was further from the mean subpopulation membership of the whole population (Figures 3.2 and 3.4). This was in concordance with the genotypic PCs which differentiated the phenotypebased modified fatty acid group, more so than the pedigree-based modified fatty acid group, from the entire population (Table 3.2). The mean subpopulation membership of the phenotype-based modified fatty acid and food grade/high protein groups were significantly different from the phenotype-based commodity group (Figure 3.4). Differentiation of phenotypes among populations and groups PCA was conducted using phenotypic data (Figure 3.5). The first two PCs describe 44% of the phenotypic variance. However, in contrast to the distinct population structure apparent from PCA with genotypic data, there was no obvious clustering from PCA with phenotypic data (Figures 3.3 and 3.5). The phenotypic variables which had a high contribution to PC1 were related seed size and included seed straight length, straight width, weight and volume (Table 3.3). The phenotypic variables which were the primary contributors to PC2 were primarily related to seed composition and included seed protein content, seed oil content, oil stearic and oleic acid content, seed density, and, interestingly, seed length to width ratio (Table 3.3). The phenotypic PCA separates individuals according to both the pedigree and phenotype based groups. The food-grade/high protein groups were significantly different than other groups and had the highest average absolute values for PC1 (seed size; Table 3.2). The modified fatty acid groups were 53 significantly different that other groups and had the highest absolute values for PC2 (seed composition; Table 3.2). DISCUSSION There were at least two subpopulations detected within this studied soybean breeding population. This is in concordance with other studies indicating that population structure appears to be common both between and within breeding programs. A barley population from eight breeding programs was separated into subpopulations, with examples of subpopulations being breeding program specific, of lines from a single breeding program divided into multiple subpopulations , and of phenotype specific subpopulations (tworow versus six-row barley) (Wang et al., 2012b). In a maize breeding program, four subpopulations reliably identified the four different heterotic groups in the breeding program (Van Inghelandt et al., 2010). Both the barley and maize studies on population structure showed that the results from the Bayesian model-based methods implemented by STRUCTURE were consistent with PCA (Van Inghelandt et al., 2010; Wang et al., 2012b). Within this studied soybean breeding program, both the choice of parental lines and the selection among progeny were related to population structure. Given that this study focused on a single population, it is unclear whether these results will be applicable to a range of populations and selection criteria. However, the effect of pedigree, or parental selection, in this population was obviated by the extreme bias towards alleles from the 54 minor subpopulation in the food-grade/high protein group defined on the basis of pedigree. The present study only evaluated a single selection event. Yet, in the modified fatty acid group, there was evidence that in a single cycle of selection, a breeding population could become structured. While we are aware of no other studies which have looked at the effects of a single selection event on population structure, previous studies have explored the effects of long term selection on population structure. Jones et al (2011) studied 651 barley landraces with both genotypic and phenotypic data and clustered these lines into nine groups associated with geographic distribution and, to some degree, phenotypes, suggesting human selection and environmental adaptations were factors in affecting the population structure of barley landraces. Wang et al (2012a) identified two subpopulations, representing Chinese wheat landraces or modern wheat varieties, in a wheat mini-core collection consisting of 262 accessions. A significant difference in kernel weights and allele frequencies of loci associated with kernel weight were detected between the two subpopulations, implying that artificial selection has improved wheat kernel weight and contributed to genetic differentiation of the subpopulations over the past six decades of the breeding program. Likewise, the extent of population structure is greater in crops where subtypes have been long selected for and are firmly defined by the industry. Genetic subpopulations of corn distinguished sweet corn and popcorn germplasm (Liu et al., 2003). Japonica and indica rice were also clearly separated into distinct subpopulations (Garris et al., 2005). Subpopulations within elite wheat germplasm clearly aligned with market classification and geographic origin for soft and hard wheat (Zhang et al., 2010). Similarly, 55 subpopulations of tomato were in concordance with defined market classes (Sim et al., 2011). In these cases, subpopulations have a clear association with subtype or industry classification of the crop. The population structure observed in these studies was likely the result of long term parental and progeny selection. Soybean was introduced as a small crop in North America in 1765 and, following World War II, its production greatly expanded for its use in animal feed (Hymowitz and Harlan, 1983). The introduction and subsequent breeding of soybean for North America was associated with an intense bottleneck, such that 75% of the genetic diversity in North American modern cultivars can be represented by only 17 founding lines (Gizlice et al., 1994). While the breeding history of soybean for adaptation to North America is relatively short, the breeding history for specialty types is extremely recent. The first registered cultivar developed for tofu purposes was released in 1981(Fehr et al., 1984), and the food-grade market has been emphasized since the early 1990s (Lim et al., 1990). In 2007, the US updated the trans-fat labeling laws, creating an emphasis on fats in relation to human health and industrial need for methods of oil stabilization other than hydrogenation, which notably creates trans-fats (American Public Health Association, 2008). This was the impetus for modified fatty acid soybean breeding programs in North America. As is seen in other crops, soybean may be trending towards the establishment of distinct subpopulations for specialty types (Garris et al., 2005; Sim et al., 2011; Wang et al., 2012b). It is clear that over many cycles of selection in a breeding population, selection for differing phenotypic groups within the population can result in genetic differentiation to 56 the extent that subpopulations are developed. The number of cycles of selection required for this to occur is likely dependent on the level of selection, the heritability of the trait, and the number of loci involved in the trait. However, the population membership of the modified fatty acid groups in the current study provides evidence that genetic differentiation can begin in a single cycle of selection. Understanding how selection affects population structure in a breeding program can provide valuable information on the selection of parental lines and new sources of alleles for a breeding program. The contribution of lines from the pedigree-based commodity group to the phenotype-based food-grade/high protein group provide evidence that, in this breeding population, valuable alleles can be contributed to a specialty type from cultivars outside of that specialty type. Further analysis within additional populations may provide evidence of haplotypes important to specific specialty types. Credits: Genotyping was conducted by MCIC. Seed protein, oil, and fatty acid measurements were conducted by NCAUR. All other work was conducted by Mao Huang with assistance from members of the McHale lab with seed traits measurements. Dr. David Francis and Dr. Steve St. Martin assisted with statistical data analysis. 57 TABLES AND FIGURES Breeding line or cv. name Pedigree Locations† Dennison¶ H09-106 H09-260 H09-266 H09-277 H09-4 HS0-3243¶ HS5-3519 HS6-3705A HS6-3705B HS6-3705C HS6-3705D HS6-3705E HS6-3705F HS6-3705G HS6-3705-R HS6-3967A HS6-3967B HS6-3967C HS6-3967D HS6-3967-R HS6-3971-R HS6-3973A HS6-3973B HS6-3973C HS6-3973D HS6-3973-R HS7-4176 HS7-4314 HS7-4437 HS7W-127 HS7W-136 HS7W-190 HS7W-191 HS7W-194 HS7W-29 HS7W-82 HS7W-94 HS8-3284 Athow x HS94-4533 SF114 x Wyandot SF114 x Ohio FG5 SF114 x Ohio FG5 SF114 x Ohio FG5 SF114 x Wyandot HS93-4118 x Kottman HS99-4577 x PI133293 HS99-4256 x Dilworth HS99-4256 x Dilworth HS99-4256 x Dilworth HS99-4256 x Dilworth HS99-4256 x Dilworth HS99-4256 x Dilworth HS99-4256 x Dilworth HS99-4256 x Dilworth HS98-78262 x PI399073 HS98-78262 x PI399073 HS98-78262 x PI399073 HS98-78262 x PI399073 HS98-78262 x PI399073 HS98-78262 x PI399073 HS98-78262 x PI399073 HS98-78262 x PI399073 HS98-78262 x PI399073 HS98-78262 x PI399073 HS98-78262 x PI399073 HS1-3641 x H2885 HS1-3641 x HS1-7116 HS1-3661 x IA3017 Kottman x Dilworth Kottman x Dilworth HS1-3641 x HS1-3907 HS1-3641 x HS1-3907 HS1-3641 x HS1-3907 H2885 x HF99-019 HS1-3641 x HS1-7116 HS1-5870 x Ohio FG4 Kottman x HS1-7267 H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, S, W H, P, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, P, S, W H, S, W H, S, W H, S, W H, S, W H, P, S, W H, P, S, W H, S, W H, S, W H, S, W H, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W Pedigreebased groups‡ YLD FG/HP FG/HP FG/HP FG/HP FG/HP YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD FA YLD YLD YLD YLD YLD YLD YLD FG/HP YLD Phenotypebased groups‡ YLD FG/HP FG/HP FG/HP FG/HP FG/HP YLD NS NS NS NS NS NS NS NS NS FG/HP YLD NS YLD NS NS NS NS NS NS NS NS YLD NS YLD NS FG/HP NS FG/HP YLD YLD NS NS Structure§ 0.003 0.948 0.982 0.996 0.986 0.913 0.03 0.724 0.475 0.579 0.558 0.49 0.547 0.492 0.481 0.516 0.003 0.002 0.002 0.002 0.002 0.002 0.71 0.002 0.002 0.003 0.002 0.748 0.412 0.496 0.009 0.005 0.165 0.154 0.171 0.611 0.26 0.924 0.241 Continued Table 3.1. Pedigrees, locations, groups and population membership for cultivars and breeding lines. † Locations in Ohio where field trials were conducted for each breeding line or cultivar; H: Hoyville, P: Plain city, S: South Charleston, W: Wooster. ‡ Predefined groups are abbreviated as follows: FA, fatty acid lines; FG/HP, food grade/High protein lines, YLD, Commodity soybean lines; All, All soybean lines in the population, NS: Non-selected individuals. §Proportion membership to the major subpopulation assigned by STRUCTURE. ¶Check cultivars grown in multiple trials. 58 Table 3.1 continued HS8-3289 HS8-3317 HS8-3331 HS8-3334 HS8-3341 HS8-3362 HS8-3451 HS8-3459 HS8-3463 HS8-3486 HS8-3538 HS8-3582 HS8-3657 HS8-3664 HS8-3667 HS8-3672 HS8-3713 HS8W-1 HS8W-102 HS8W-103 HS8W-106 HS8W-115 HS8W-156 HS8W-177 HS8W-178 HS8W-179 HS8W-183 HS8W-184 HS8W-185 HS8W-23 HS8W-3 HS8W-30 HS8W-503 HS8W-504 HS8W-507 HS8W-510 HS8W-514 HS8W-515 HS8W-517 HS8W-518 HS8W-520 HS8W-521 HS8W-54 HS8W-56 HS8W-58 HS8W-68 HS8W-69 HS8W-8 HS8W-82 HS8W-83 HS8W-93 HS8W-96 M09-A003 M09-A037 M09-A044 M09-A045 M09-A047 M09-A059 M09-A060 M09-A061 Kottman x HS1-7267 HS3-25233 x (Williams x PI 424354) HS3-25233 x (Williams x PI 424354) OHS 3033 x (Williams x PI424354) Dennison2 x HFPR4 Dennison2 x HFPR4 HS3-25233 x (Williams x PI 424354) HS3-25233 x (Williams x PI 424354) OHS 3033 x (Williams x PI424354) Dennison2 x HFPR4 Dennison2 x HFPR4 HS3-25233 x (Williams x PI 424354) HS1-3661 x LG00-3372 HS1-3661 x LG00-3372 HS1-3661 x LG00-3372 HS1-3661 x LG00-3372 C2033 x HS1-423 HS0-3243 x LG00-3372 HS2-8086 x HS1-7267 HS2-8086 x HS1-7267 HF01-0821 x Kottman HF01-0821 x Kottman HS1-6811 x IA 3017 HS1-3641 x HS1-7116 HS1-424 x Ohio FG4 HS1-424 x Ohio FG4 HS1-3641 x HS1-3907 HS0-8435 x HS1-3907 HS0-8435 x HS1-3907 HS1-3661 x HF01-0821 HS0-3243 x LG00-3372 HS1-3661 x HF01-0821 Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A Dennison x N98-4445A HS1-3661 x HF01-0821 HS1-3661 x HF01-0821 HS1-3661 x HF01-0821 HS1-3661 x HF01-0821 HS1-3661 x HF01-0821 HS0-3243 x LG00-3372 HS1-3661 x HF01-0821 HS1-3661 x HF01-0821 HS2-8086 x HS1-7267 HS2-8086 x HS1-7267 Dennison x HF03-546 HFPR-4 x LS01-1987 HFPR-4 x LS01-1987 HFPR-4 x LS01-1987 HFPR-4 x LS01-1987 HS0-3243 x HF03-546 HS0-3243 x HF03-546 HS0-3243 x HF03-546 H, P, S, W YLD NS 0.283 H, P, S, W YLD YLD 0.003 H, P, S, W YLD NS 0.004 H, P, S, W YLD NS 0.66 H, P, S, W H, P, S, W YLD YLD NS YLD 0.002 0.002 H, P, S, W YLD FG/HP 0.016 H, P, S, W YLD NS 0.021 H, P, S, W YLD YLD 0.674 H, P, S, W H, P, S, W YLD YLD NS NS 0.143 0.002 H, P, S, W YLD NS 0.004 H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, S, W H, S, W H, S, W H, P, S, W H, S, W H, S, W H, P, S, W H, P, S, W H, P, S, W H, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, P, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD FG/HP FG/HP YLD YLD YLD YLD YLD YLD FA FA FA FA FA FA FA FA FA FA YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD NS YLD NS NS FG/HP FG/HP YLD NS NS NS NS FG/HP NS NS NS NS YLD YLD NS YLD NS YLD FG/HP NS FG/HP FG/HP NS NS YLD NS NS NS NS YLD NS NS NS YLD NS NS NS NS NS NS YLD YLD 0.75 0.965 0.769 0.462 0.874 0.296 0.791 0.789 0.133 0.091 0.112 0.248 0.995 0.994 0.124 0.519 0.366 0.544 0.33 0.575 0.045 0.02 0.027 0.024 0.03 0.006 0.251 0.108 0.082 0.219 0.502 0.59 0.512 0.593 0.69 0.201 0.533 0.464 0.008 0.089 0.382 0.16 0.128 0.249 0.367 0.296 0.191 0.159 Continued 59 Table 3.1 continued M09-A063 M09-A075 M09-A076 M09-A077 M09-A078 M09-A079 M09-A086 M09-A088 M09-B001 M09-B002 M09-B004 M09-B005 M09-B010 M09-B012 M09-B013 M09-B015 M09-B016 M09-B017 M09-B019 M09-B020 M09-B021 M09-B023 M09-B024 M09-B025 M09-B026 M09-B027 M09-B028 M09-B030 M09-B031 M09-B032 M09-B033 M09-B034 M09-B036 M09-B037 M09-B038 M09-W031 M09-W032 M09-W033 M09-W034 M09-W035 M09-W036 M09-W037 M09-W038 M09-W039 M09-W041 M09-W042 M09-W043 M09-W045 M09-W047 M09-W049 M09-W050 M09-W051 M09-W052 M09-W053 M09-W054 M09-W055 M09-W056 HS0-3243 x HF03-546 OHS 201 x IA3017 OHS 201 x IA3017 OHS 201 x IA3017 OHS 201 x IA3017 OHS 201 x IA3017 HS1-3661 x HS1-7531 HS1-3661 x HS1-7531 HS1-3661 x HS1-7531 HS1-3661 x HS1-7531 HS1-3661 x HS1-7531 HS1-3661 x HS1-7531 IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison HS1-3661 x (OHS 201 x Md99-173-11-17) HS1-3661 x (OHS 201 x Md99-173-11-17) HS1-3661 x (OHS 201 x Md99-173-11-17) HS1-3661 x (OHS 201 x Md99-173-11-17) HS1-3661 x (OHS 201 x Md99-173-11-17) HS1-3661 x (OHS 201 x Md99-173-11-17) HS1-3661 x (OHS 201 x Md99-173-11-17) HS1-3661 x (OHS 201 x Md99-173-11-17) HS1-3661 x (OHS 201 x Md99-173-11-17) HS1-3661 x (OHS 201 x Md99-173-11-17) OHS 304 x LG00-3372 OHS 304 x LG00-3372 OHS 304 x LG00-3372 OHS 304 x LG00-3372 HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison LG00-3372 x HS3-2523 LG00-3372 x HS3-2523 LG00-3372 x HS3-2523 H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W YLD FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA NS FA NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS YLD FA NS NS NS YLD NS 0.182 0.654 0.582 0.624 0.679 0.704 0.506 0.369 0.519 0.664 0.666 0.69 0.035 0.004 0.182 0.012 0.005 0.02 0.004 0.007 0.01 0.005 0.413 0.005 0.01 H, S, W FA NS 0.934 H, S, W FA NS 0.645 H, S, W FA NS 0.856 H, S, W FA FA 0.99 H, S, W FA NS 0.98 H, S, W FA NS 0.952 H, S, W FA NS 0.37 H, S, W FA NS 0.917 H, S, W FA NS 0.912 H, S, W FA YLD 0.989 H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD NS FG/HP YLD NS YLD YLD YLD NS YLD YLD NS FG/HP YLD NS NS NS YLD YLD YLD YLD NS NS 0.017 0.011 0.03 0.008 0.003 0.005 0.002 0.002 0.12 0.003 0.007 0.005 0.002 0.202 0.002 0.224 0.004 0.004 0.005 0.241 0.269 0.298 Continued 60 Table 3.1 continued M09-W060 M09-W061 M09-W063 M09-W065 M09-W066 M09-W084 M09-W085 M09-W086 M09-W087 M09-W088 M09-W089 M09-W095 M09-W096 M09-W099 M09-W104 M09-W105 M09-W106 M09-W109 M09-W110 M09-W111 M09-W117 M09-W118 M09-W122 M09-W124 M09-W125 M09-W126 M09-W127 M09-W128 M09-W129 M09-W130 M09-W131 M09-W132 M09-W133 M09-W142 M09-W143 M09-W144 M09-W145 M09-W146 M09-W147 M09-W148 M09-W149 M09-W150 M09-W151 M09-W152 M09-W153 M09-W154 M09-W155 M09-W156 M09-W157 M09-W158 M09-W159 M09-W160 M09-W161 M09-W162 M09-W163 M09-W166 M09-W169 LG00-3372 x HS3-2523 LG00-3372 x HS3-2523 LG00-3372 x HS3-2523 LG00-3372 x HS3-2523 LG00-3372 x HS3-2523 Dennison x HF03-546 Dennison x HF03-546 Dennison x HF03-546 Dennison x HF03-546 Dennison x HF03-546 Dennison x HF03-546 HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x Dennison HS0-3243 x U01-390489 HS0-3243 x U01-390489 HS0-3243 x U01-390489 HS0-3243 x U01-390489 HS0-3243 x U01-390489 HS0-3243 x U01-390489 IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison IA3017 x Dennison HS0-3243 x IA 3017 HS0-3243 x IA 3017 HS0-3243 x IA 3017 HS0-3243 x IA 3017 HS0-3243 x IA 3017 HS0-3243 x IA 3017 HS0-3243 x IA 3017 (HS1-3661 x Wyandot) x Md99-173-11-17 (HS1-3661 x Wyandot) x Md99-173-11-18 (HS1-3661 x Wyandot) x Md99-173-11-19 (HS1-3661 x Wyandot) x Md99-173-11-20 (HS1-3661 x Wyandot) x Md99-173-11-21 (HS1-3661 x Wyandot) x Md99-173-11-22 (HS1-3661 x Wyandot) x Md99-173-11-23 (HS1-3661 x Wyandot) x Md99-173-11-24 (HS1-3661 x Wyandot) x Md99-173-11-25 (HS1-3661 x Wyandot) x Md99-173-11-26 (HS1-3661 x Wyandot) x Md99-173-11-27 Dennison x IA2065 Dennison x IA2065 Dennison x IA2065 Dennison x IA2065 Dennison x IA2065 Dennison x IA2065 H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD YLD FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA NS NS YLD YLD YLD YLD NS NS YLD NS NS FG/HP NS NS FG/HP YLD YLD YLD NS NS YLD NS YLD NS NS FA NS NS FG/HP NS NS NS FA NS FA FA NS FA NS FA 0.323 0.308 0.188 0.275 0.318 0.276 0.173 0.004 0.115 0.243 0.014 0.004 0.009 0.005 0.01 0.033 0.019 0.578 0.578 0.509 0.434 0.415 0.523 0.199 0.008 0.027 0.014 0.017 0.02 0.016 0.059 0.069 0.015 0.008 0.008 0.007 0.009 0.009 0.004 0.014 H, S, W FA NS 0.991 H, S, W FA FA 0.993 H, S, W FA NS 0.993 H, S, W FA NS 0.992 H, S, W FA FG/HP 0.992 H, S, W FA FA 0.995 H, S, W FA YLD 0.983 H, S, W FA FA 0.99 H, S, W FA NS 0.993 H, S, W FA FA 0.989 H, S, W FA FA 0.992 H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W FA FA FA FA FA FA YLD NS NS NS NS NS 0.144 0.189 0.237 0.177 0.462 0.008 Continued 61 Table 3.1 continued M09-W171 M09-W172 M09-W173 M09-W174 M09-W175 M09-W176 M09-W177 M09-W179 M09-W180 M09-W183 M09-W184 M09-W195 M09-W196 M09-W197 M09-W199 M09-W201 M09-W202 Ohio FG1 Ohio FG3 Ohio FG4¶ Ohio FG5¶ OHS 202¶ OHS 305 OHS 306 Prohio¶ Streeter¶ Wyandot¶ Wyandot Rg2 Wyandot Rg2 Pm Dennison x IA2065 HFPR-4 x IA2065 HFPR-4 x IA2065 HFPR-4 x IA2065 HFPR-4 x IA2065 Dennison x HS1-7531 Dennison x HS1-7531 Dennison x HS1-7531 Dennison x HS1-7531 Dennison x HS1-7531 Dennison x HS1-7531 HS1-3661 x IA3017 HS1-3661 x IA3017 HS1-3661 x IA3017 HS1-3661 x IA3017 HS1-3661 x IA3017 HS1-3661 x IA3017 LS301 x HS84-6247 HS89-8843 x Ohio FG1 Ohio FG1 x HS89-3078 A95-581028 x PI592926 A98-980047 x Kottman HS98-3409 x HS99-5021 HC94-81PR x Asgrow2506 U97-3114 x HS97-5261 OXR-96243 x Ohio FG1 Wyandot x PI87 H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, S, W H, P, S, W H, P, S, W H, P, S, W H,W H, S, W H, S, W H, P, S, W H, P, S, W H, S, W H, P, S, W H, P, S, W FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FG/HP FG/HP FG/HP FG/HP YLD YLD FG/HP FG/HP YLD FG/HP FG/HP NS NS FA FA NS NS NS NS YLD NS NS NS FA YLD NS YLD NS FG/HP FG/HP FG/HP FG/HP YLD YLD FG/HP FG/HP YLD FG/HP FG/HP 0.134 0.16 0.254 0.272 0.214 0.234 0.214 0.279 0.234 0.218 0.235 0.525 0.553 0.33 0.58 0.566 0.51 0.997 0.897 0.961 0.998 0.525 0.151 0.95 0.554 0.002 0.995 0.998 Wyandot x PI87 H, P, S, W FG/HP FG/HP 0.996 62 Phenotypic Phenotypic Genotypic PC1 PC2 PC1 Genotypic PC2 Genotypic PC3 a. Correlation coefficients Major population membership -0.29*** -0.37*** 0.86*** 0.41*** 0.15 b. Mean PC values All 0cd 0bc 0ab 0b 0b Pedigree- FA† Pedigree- FG/HP Pedigree- YLD -0.44d 4.5a -0.26cd 0.92a -0.30bcd -0.65cd 0.31ab -19d 2.2a -2.6c 6.3a 1.1b 2.7a -0.56b -2.0b Phenotypic- All selected Phenotypic- FA Phenotypic- FG/HP Phenotypic- YLD Phenotypic- Non-selected 0.60c -0.62d 2.6b -0.11cd -0.44d -0.21bcd 1.2a -0.73d -0.37bcd 0.16b -1.4b -1.1b -8.7c 2.5a 1.0ab 0.57b -5.3d 4.2a 0.41b -0.42bc -0.066b 4.5a -0.65b -1.1b 0.049b Table 3.2. Correlation coefficients for subpopulations and PCs and mean PC values for groups. (a) Pearson’s correlation coefficient between the major population membership and PC values for the first two PCs from phenotypic data and the first three PCs from the genotypic data. Significance of Pearson’s correlation coefficient was tested by t-test and corrected for multiple comparisons using Bonferonni’s method; indicated by * for adjusted p ≤ 0.05, ** for adjusted p ≤ 0.01, *** for adjusted p ≤ 0.001. (b) Mean PC values of each predefined group for the aforementioned PCs. Different superscript letters indicate significant differences (p ≤ 0.05; Duncan's test for multiple comparisons) between group means for an individual PC. Group means which are significantly different from the entire population (All) are in bold. †Abbreviations are as described in Table 3.1. 63 Seed straight length Seed volume Seed weight Seed straight width Seed protein content Oil palmitic acid Seed oil content Seed density Oil linoleic acid Seed length:width Oil stearic acid Yield Oil oleic acid Oil linolenic acid Phenotypic PC1 0.46 0.46 0.46 0.45 0.27 -0.17 -0.12 -0.12 -0.10 0.07 0.07 -0.07 0.06 0.04 Phenotypic PC2 0.15 0.08 0.01 0.05 -0.42 -0.17 0.39 -0.29 0.21 0.41 -0.38 -0.09 0.35 -0.28 Table 3.3. Eigenvectors for all of the phenotypic variables contributing to phenotypic PC1 and PC2. Absolute values of eigenvectors which are > 70% of the absolute value of the highest eigenvector are shown in bold and contribute highly to the PC (Daultry et al., 1976; Mardia et al., 1979). 64 Figure 3.1. Plot of ∆K. ∆K is equal to the mean of the absolute value of L’’(K) divided by the standard deviation of L(K), where L(K) is the log likelihood function of true number of subpopulations (K) and L’’(K) is the second order rate of change of L(K). 65 Figure 3.2. Bayesian admixture proportion for individual soybean lines with the K = 2 population model. Vertical bars represent individual lines which are partitioned into 2 (K) colored segments according to their subpopulation membership. Vertical black lines separate soybean lines based on predefined groups. Number within predefined group areas indicates the percent of alleles attributed to the major subpopulation (red). (a) Commodity, modified fatty acid, and food grade/high protein groups are defined according to their pedigree history. (b) Commodity, modified fatty acid, food grade-high protein, and non-selected groups are defined according to the observed phenotypes. 66 Figure 3.3. PCA plot of genotypic data. Individuals are colored in accordance with the STRUCTURE output displayed in figure 3.2, where individuals with over 90% of alleles being contributed from the major subpopulation are colored red, individuals with over 90% of alleles being contributed from the minor subpopulation are colored green, and admixed individuals are colored black. Abbreviations are as defined in table 3.1. (a) Groups were predefined based on pedigree information. (b) Groups were predefined based on observed phenotypes. 67 Figure 3.4. Bar graph displaying the percentage of alleles attributed to the major subpopulation for each group defined on the basis of pedigree or phenotype. Abbreviations are as described in Table 3.1. All: All soybean lines in the population. Different letters indicate significant difference (p ≤ 0.05; Duncan's test of multiple comparisons). 68 Figure 3.5. PCA plots of phenotypic data. Individuals are colored in accordance with the STRUCTURE output displayed in figure 3.2, where individuals with over 90% of alleles being contributed from the major subpopulation are colored red, individuals with over 90% of alleles being contributed from the minor subpopulation are colored green, and admixed individuals are colored black. Abbreviations for groups are as described in table 3.1. (a) Groups were predefined based on pedigree information. (b) Groups were predefined based on observed phenotypes. 69 REFERENCES American Public Health Association. Restricting trans fatty acids in the food supply. http://www.apha.org/advocacy/policy/policysearch/default.htm?id=1366. Retrieved 11-03-2012. Bachlava E, Dewey RE, Burton JW, Cardinal AJ (2009) Mapping and comparison of quantitative trait loci for oleic acid seed content in two segregating soybean populations. Crop Science. 49(2): 433-442. Burton JW, Wilson RF, Novitzky W, Carter TE (2004) Registration of ‘Soyola’ soybean. Crop Science. 44(2): 687-688. Burton JW, Wilson RF, Rebetzke GJ, Pantalone VR (2006) Registration of N98–4445A mid-oleic soybean germplasm line. Crop Science. 46(2): 1010-1012. Cardon LR, Palmer LJ (2003) Population stratification and spurious allelic association. The Lancet. 361: 598-604. Daultry S (1976) Principal Components Analysis. Geo Abstracts Limited: East Anglia, Norwich. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology. 14: 26112620. Fehr WR, Bahrenfus JB, Walker AK (1984) Registration of Vinton 81 Soybean. Crop Science. 24(2): 384. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 164: 1567-1587. Garris AJ, Tai TH, Coburn J, Kresovich S, McCouch S (2005) Genetic structure and diversity in Oryza sativa L. Genetics. 169:1631-1638. Gizlice Z, Carter TE, and Burton JW (1994) Genetic base for North American public soybean cultivars released between 1947 and 1988. Crop Science. 34:1143-1151 Glaszmann JC, Kilian B, Upadhyaya HD, Varshney RK (2010) Accessing genetic diversity for crop improvement. Current Opinion in Plant Biology. 13(2): 167-173. 70 Grundy SM, Florentin L, Nix D, Whelan MF (1988) Comparison of monounsaturated fatty acids and carbohydrates for reducing raised levels of plasma cholesterol in man. The American Journal of Clinical Nutrition. 47: 965-969. Han YP, Xie DX, Teng WL, Zhang SH, Chang W, Li WB (2011) Dynamic QTL analysis of linolenic acid content in different developmental stages of soybean seed. Theoretical and Applied Genetics. 122: 1481-1488. Hirata TH, Abe J, Shimamoto Y (1999) Genetic structure of the Japanese soybean population. Genetic Resources and Crop Evolution. 46: 441-453. Hymowitz T, Harlan JR (1983) Introduction of Soybean to North America by Samuel Bowen in 1765. Economic Botany. 37(4): 371-379. Hyten DL, Choi I-Y, Song Q, Specht JE, Carter TE, Shoemaker RC, Hwang E-Y, Matukumallif LK, Cregan PB (2010) A high density integrated genetic linkage map of soybean and the development of a 1536 universal soy linkage panel for quantitative trait locus mapping. Crop Science. 50: 960-968. Jones H, Civáň P, Cockram J, Leigh FJ, Smith LM, Jones MK, Charles MP, MolinaCano J-L, Powell W, Jones G, Brown TA (2011) Evolutionary history of barley cultivation in Europe revealed by genetic analysis of extant landraces. Evolutionary Biology. 11: 320-331. Keim P, Olson TC, Shoemaker RC (1988) A rapid protocol for isolating soybean DNA. Soybean Genetics Newsletter. 15: 150-152. Kim KS, Diers BW, Hyten DL, Mian MAR, Shannon JG, Nelson RL (2012) Identification of positive yield QTL alleles from exotic soybean germplasm in two backcross populations. Theoretical and Applied Genetics. 125: 1353-1369. Krishnan HB (2005) Engineering soybean for enhanced sulfur amino acid content. Crop Science. 45: 454-461. Kuroda Y, Kaga A, Tomooka N, Vaughan DA (2006) Population genetic structure of Japanese wild soybean (Glycine soja) based on microsatellite variation. Molecular Ecology. 15: 959-974. Lange CE, Federizzi LC (2009) Estimation of soybean genetic progress in the South of Brazil using multi-environmental yield trials. Science Agricola. 66: 309–316. Lee JD, Bilyeu KD, Shannon JG (2007) Genetics and breeding for modified fatty acid profile in soybean seed oil. Journal of Crop Science and Biotechnology. 10(4): 201210. 71 Liu KJ, Goodman M, Muse S, Smith JS, Buckler E Doebley J (2003) Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics. 165: 2117-2128. Mardia KV, Kent JT, Bibby JM (1979) Multivariate Analysis. Academic Press: London. McHale LK, Feller MK, McIntyre SA, Berry SA, St. Martin SK, Dorrance AE (2012) Registration of 'Summit', a high-yielding soybean with race-specific resistance to Phytophthora sojae. Journal of Plant Registrations. doi: 10.3198/jpr2012.01.0012crc. Miller JF, Zimmerman, Vick BA (1987) Genetic control of high oleic acid content in sunflower oil. Crop Science. 27: 923-926. Panthee DR, Pantalone VR, Saxton A (2006) Modifier QTL for fatty acid composition in soybean oil. Euphytica. 152(1): 67-73. Poysa V, Woodrow L, Yu K (2006) Effect of soy protein subunit composition on tofu quality. Food Research International. 39: 309-317. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics. 155: 945-959. Shannon JG, Sleper DA, Arelli DR, Burton JW, Wilson RF, Anand SC (2005) Registration of S01-9269 Soybean Germplasm Line Resistant to Soybean Cyst Nematode with Seed Oil Low in Saturates. Crop Science. 45(4): 1673-1674. Shi A, Chen P, Zhang B, Hou A (2010) Genetic diversity and association analysis of protein and oil content in food-grade soybeans from Asia and the United States. Plant Breeding. 129: 250-256. Sim SC, Robbins MD, Deynze AV, Michel AP, Francis DM (2011) Population structure and genetic differentiation associated with breeding history and selection in tomato (Solanum lycopersicum L.). Heredity. 106: 927-935. St. Martin SK, Calip-DuBois AJ, Fioritto RJ, Schmitthenner AF, Min DB, Yang T-S, Yu YM, Cooper RL, Martin RJ (1996) Registration of ‘Ohio FG1’ Soybean. Crop Science. 26: 813. St. Martin SK, Feller MK, Fioritto MJ, McIntyre SA, Dorrance AE, Berry SA, Sneller CH (2006) Registration of ‘HS0–3243’ Soybean. Crop Science. 46:1811. St. Martin SK, Mills GR, Fioritto RJ, McIntyre SA, Dorrance AE, Berry SA (2006) Registration of ‘Ohio FG5’Soybean. Crop science. 46(6): 2709-2709. 72 St. Martin SK, Mills GR, Fioritto RJ, McIntyre SA, Dorrance AE, Cooper RL (2004) Registration of ‘Ohio FG3’ soybean. Crop Science. 44: 687. Van Inghelandt D, Melchinger AE, Lebreton C, Stich B (2010) Population structure and genetic diversity in a commercial maize breeding program assessed with SSR and SNP markers. Theoretical and Applied Genetics. 120: 1289-1299. Wang LF, Ge H, Hao C, Dong Y, Zhang X (2012) Identifying loci influencing 1,000kernel weight in wheat by microsatellite screening for evidence of selection during breeding. PLoS ONE. 7(2): e29432. Wang HL, Swain EW, Kwolek WF, Fehr WR (1983) Effect of soybean varieties on the yield and quality of tofu. Cereal Chemistry. 60(3): 245-248. Wang HY, Smith KP, Combs E, Blake T, Horsley RD, Muehlbauer GJ (2012) Effect of population size and unbalanced data sets on QTL detection using genome-wide association mapping in barley breeding germplasm. Theoretical and Applied Genetics. 124: 111-124. Yan J, Wen-Ju Z, Da-Xu F, Bao-Rong L (2003) Sampling strategy within a wild soybean population based on its genetic variation detected by ISSR markers. Acta Botanica Sinica. 45(8): 995-1002. Zhang DD, Bai GH, Zhu CS, Yu JM, and Carver BF (2010) Genetic diversity, population structure, and linkage disequilibrium in U.S. elite winter wheat. The Plant Genome. 3(2): 117-127. 73 CHAPTER 4 ASSOCIATION MAPPING OF FOOD-GRADE QUALITY TRAITS IN A SOYBEAN BREEDING PROGRAM FOR COMMODITY AND FOOD-GRADE CULTIVARS Abstract Soybean is primarily used as a vegetable oil for human consumption and a high protein feed for livestock. Food-grade soybeans, used to produce tofu, miso, edamame, soymilk, soy sauce, natto and tempeh, are a specialty crop with unique chemical and physical seed quality requirements. We have estimated the proportion of phenotypic variance explained by genetic variance for various soybean seed quality traits and detected significant marker-trait associations through association mapping. Genotypic and phenotypic analyses were conducted on an initial mapping population of 242 breeding lines and cultivars from OSU soybean germplasm. Soybean traits analyzed included seed oil and protein content, volume, weight, density, and seed shape (straight length, straight width, and length to width ratio). Fifty significant marker-trait associations were detected in the initial mapping population using a mixed linear model. Twenty-seven of the significant associations were assessed by one-way ANOVA in an independent confirmation population consisting of 152 breeding lines; 12 were confirmed. 74 Confirmed marker-trait associations included novel loci for seed shape. As a result of conducting association mapping in a breeding population, these marker trait associations can be directly applied in marker assisted selection. 75 INTRODUCTION Soybeans are widely sold as a commodity and processed into vegetable oil and meal (Panthee et al., 2005). In contrast to commodity soybeans, food-grade soybeans, used to produce tofu, miso, edamame, soymilk, soy sauce, natto and tempeh, are a specialty crop sold at a premium and possessing a unique suite of seed quality requirements (Poysa et al., 2002; Zhang et al., 2010). These requirements are specifically tailored to the targeted end product and include seed qualities such as seed protein, oil, sugar, and secondary metabolite content and composition as well as seed coat color, hilum color, seed size, and seed shape (Poysa et al., 2006; Shi et al., 2010). The present study has focused on several well-studied traits which are crucial to food-grade soybeans and include seed weight, protein and oil, as well as several less studied traits, including seed volume, density, shape. Soybean protein and oil content are important, not only for food-grade soybean, but also for commodity soybean, and have been extensively studied (Diers et al., 1992; Chung et al., 2003; Hyten et al., 2004; Panthee et al., 2005; Clemente and Cahoon, 2009; Bolon et al., 2010; Liang et al. 2010; Shi et al., 2010). Seed protein and oil contents are generally negatively correlated (Liang et al., 2010) and highly heritable (Hyten et al., 2004). To date, 124 quantitative trait loci (QTL) for oil and 108 QTL for protein have been detected (Hyten et al., 2004; Panthee et al., 2005; Liang et al. 2010; Shi et al., 2010). Seed weight has also been widely studied (Teng et al., 2008). Broad-sense heritability, as estimated from three bi-parental populations, is high (Hoeck et al., 2003). 76 A total of 120 QTLs have been reported for seed weight (Soybase, 2012). Seed volume and seed density have been less studied; however, there is little variation for seed density, thus seed weight is highly correlated with seed volume (Cai et al., 1997). The heritability of seed volume, estimated from seed length, width, and height in three mapping populations, has been estimated as moderate to high (Salas et al., 2006). Seed shape traits, including seed length, width and length to width ratio, are considered to be critical elements of seed quality by tofu processors (Poysa et al., 2006; Salas et al., 2006). The heritabilities of these traits range from moderate to high (Salas et al., 2006). Only a few studies have conducted QTL analysis on seed shape in a small number of bi-parental populations (Salas et al., 2006). With few exceptions (e.g. Shi et al., 2010), the QTL identified for seed quality traits have been identified in bi-parental mapping populations, which, by design, are limited in possible recombination events and the number of haplotypes represented. Compared to linkage mapping in a conventional bi-parental population, association mapping exploits historical recombination events and increased haplotype diversity; thus, leading to a relatively higher mapping resolution (Zhu et al., 2008). The creation of an inbred bi-parental mapping population requires multiple generations of single seed descent; however, in the context of a breeding program, association mapping can efficiently utilize research resources and time by conducting mapping on existing individuals which are part of the breeding population. In addition, results of association mapping can be more directly applied to the breeding program as it can be conducted in wider germplasm than bi-parental population, whose results tend to be contextual and 77 only applicable in the mapping population or individuals with a similar genetic base (Zhu et al., 2008). To date, many studies have reported using association mapping techniques in breeding programs to identify loci controlling complex traits with moderate to high heritability. Association mapping studies conducted in a soft winter wheat (Triticum aestivum L.) breeding population identified loci associated with kernel size (Breseghello and Sorrells, 2006). In a maize (Zea mays L.) breeding population, loci were identified for grain yield and moisture (Liu et al., 2011). Loci contributing to resistance of Fusarium head blight were confirmed and mapped more precisely in an association mapping study conducted in a barley (Hordeum vulgare L.) breeding population (Massman et al., 2009). In soybean, association mapping in breeding populations has been used to identify loci associated with iron deficiency chlorosis (Wang et al., 2008). The identification of alleles which will be useful for selection and improvements of the associated traits can be predicted by the frequency of the allele in the population, the contribution of the locus to the total variance, and the robustness of marker-trait association as determined by the confirmation of the association in multiple populations and environments (Wang et al., 2008). The aforementioned studies and others have shown that association mapping conducted in a breeding population can result in the direct utilization of marker-trait linkage in a marker assisted selection program. 78 MATERIALS AND METHODS Initial population Genotypic and phenotypic analyses have been being conducted on breeding lines and cultivars from OSU soybean germplasm. The first mapping population consisted of 242 lines and is described in Chapter 3. Confirmation population An independent confirmation population of 152 breeding lines was also selected from the OSU soybean germplasm (Table 4.1). This confirmation population consisted of a set of 152 F4:6 lines which were non-overlapping with the initial mapping population. Breeding lines were grown with check cultivars in three field tests (OPTA, OPTB1, and OPTB2) in the summer of 2011 with the same field design described in Chapter 3. Phenotypic data collection Phenotypic data collected included seed protein and oil concentration, seed weight, volume, density, length, width, and length to width ratio. The collection of this data from the initial mapping population is described in Chapter 3. For the confirmation population, phenotypic data was collected in the same manner as the initial population. Minor changes were that block replicates were included for the seed shape measurements and the near-infrared spectroscopy for protein and oil 79 measurements was performed using a Perten DA7200 Feed Analyzer (Perten Instruments, Stockholm, Sweden). Statistical analysis of phenotypes For each trait the genotypic effect of each line was estimated Best Linear Unbiased Prediction (BLUP) values, calculated using SAS v. 9.2 (SAS Institute Inc., Cary, NC) as described in Chapter 3. Genetic variance and proportion of genetic variance to total phenotypic variance ratio was estimated for each trait. Genotype and locations were considered random in SAS PROC MIXED: h2 = σ2G / [ σ2G + (σ2GL / L) + (σ2e / Rep*L)], where h2 is the heritability, σ2G represents the genetic variance, σ2GL is the variance of genotype by location, σ2e is the variance of error, Rep is the number of field block replicates, L is the number of locations (Nyquist, 1991). Genotypic data collection A total of 504 markers were used for the first mapping population (Chapter 3). Briefly, these markers were a subset of markers from 768 Illumina GoldenGate markers which were assayed on the BeadXpress (Chapter3). Uninformative markers were removed to achieve the final data set of 504 markers genotyped on 242 individuals (Chapter 3). Missing values were imputed using fastPHASE (Stephens and Donnely, 2003; Scheet and Stephens, 2006). Markers were evenly distributed across all chromosomes with an average gap distance of 4.3 cM (Chapter 3). 80 The confirmation population was genotyped with a set of 384 GoldenGate markers (Illumina Inc., San Diego, CA), described in chapter 3 as the second set of markers. The set of 384 GoldenGate markers included 14 markers found to have significant associations with traits in the initial mapping population. Genotyping of the confirmation with GoldenGate markers was conducted using a BeadXpress (Illumina Inc.) at the Molecular and Cellular Imaging Center at Ohio Agricultural Research and Development Center at the Ohio State University (Chapter 3). Association mapping For the initial mapping population, the software STRUCTURE was used to cluster the population into sub-groups and generate the Q-matrix (Chapter 3). The software TASSEL (Bradbury et al., 2007) was used to conduct association mapping with the unified mix model (Yu and Buckler, 2005): Y = Xβ + Mα + Qw + Ku + e, where Y is phenotypic score, Xβ represents fixed effects other than SNP markers and population structure, Mα is marker effects, Qw represents population structure, Ku represents familial relatedness and e is the error term. The Kinship matrix required for association mapping was directly generated in TASSEL (Bradbury et al, 2007). For the confirmation population, single marker one-way ANOVA was conducted for a subset of 27 markertrait associations which were detected to be significant in the initial population. 81 RESULTS The ratio of genetic variance to phenotypic variance (σ2G/ σ2P) for all traits ranged from moderate to high; exceptions were seed density and seed straight length to width ratio (Table 4.2). For traits with moderate to high σ2G/ σ2P, σ2G/ σ2P was consistent between the initial and confirmation populations (Table 4.2). A total of 52 significant marker-trait associations were detected for 30 markers. The number of markers found to be significantly associated with the trait varied for each trait and ranged from 1 to 11 (Table 4.3). With a moderate σ2G/ σ2P ratio in the initial mapping population (Table 4.2), seed straight length had 11, the largest total number, of significant marker-trait associations detected in the initial mapping population (Table 4.3). Marker-trait associations were found on 14 of the 20 chromosomes and were distributed into 22 genetic regions or clusters (Table 4.3; Figure 4.3). These regions were named according to their linkage group to facilitate discussion within this study (Table 4.3). Limiting the familywise Type I error rate to 0.1% resulted in 50 significant marker-trait associations (Table 4.3; Benjamini and Hochberg, 1995). The proportion of total variance explained by each significant marker (R2) ranges from 5.4% to 20.0%. Markers with the largest effects include BARC-016485-02069 on chr. 3 at 61.5 cM and associated with seed length, width, volume, and Gmax7x61_2757538 on chr. 20 at 19 cM and associated with seed protein and oil concentrations (Table 4.3; Hyten et al. 2008; Hyten et al., 2010). It should be noted that the MLM used for the association mapping estimates R2 independently for each marker 82 (Bradbury et al., 2007). Thus, as a result of linkage disequilibrium, the sum of R2 for a trait is often greater than one. Twenty-seven of the 50 significant marker-trait associations were tested using singlemarker ANOVA in the confirmation population; 11 of them were at monomorphic or low minor allele frequency. In total, 12 out of 16 polymorphic significant marker-trait loci that were tested in the confirmation population were confirmed (Table 4.3). The confirmed marker-trait associations included seed length, width, length to width ratio, volume, and weight. No markers which had a significant association for seed oil or protein concentration were tested in the confirmation population (Table 4.3) Although LD decayed rapidly in this population (Figure 4.1), there is evidence of LD at both linked and unlinked markers (e.g. Figure 4.3). There were five regions of markertrait associations which were in LD with one or two other regions which were physically unlinked (Figure 4.3). Analysis of a subset of markers in an independent confirmation population allowed further interpretation of the validity of these marker-trait associations. There exists LD between genetic regions N-1 (seed length, width, volume, weight), and C1-1 (seed length, width, volume, weight) (Table 4.3; Figure 4.3). Marker BARC016485-02069 in region N-1; markers BARC-016519-02081, BARC-021219-04011, and Gmax7x194_694393 in region C1-1 were tested in the confirmation population. BARC016519-02081 and BARC-021219-04011 were confirmed to be significantly associated with seed length, width, and weight; BARC-021219-04011 was also confirmed to be associated with seed volume. LD was also detected between genetic regions K-1 and O-3, both of which were associated with seed length to width ratio in the initial population 83 (Table 4.3; Figure 4.3). Both of these markers were tested in the confirmation population; the association of seed length to width ration was confirmed for the marker in region K-1, the association was not confirmed for the marker in region O-3 (Table 4.3, Figure 4.2). LD was detected between genetic regions J-1 and D2-1, but both markers were not tested in the confirmation population (Table 4.3, Figures 4.1 and 4.2). DISCUSSION In this study, an independent confirmation population was used to evaluate 27 out of 50 significantly detected marker-trait associations and 12 of them were confirmed. Two significant marker-trait associations for seed density were tested in the confirmation population and none were confirmed (Table 4.2). These “spurious” associations detected in the initial mapping population may be attributed to the low contribution of genetic variance to phenotypic variance (Table 4.1) and serve to emphasize the need to confirm QTL in an independent study. On chromosome 4 at 15.6 cM and 20.3 cM, markers were confirmed to be significantly associated with seed straight length, straight width and seed weight (Table 4.3). The marker at 20.3 cM was also confirmed to be significantly associated with seed volume (Table 4.3). These loci correspond to known QTL for length, volume, and weight (Mian et al., 1996; Salas et al., 2006). While no QTL have been reported for seed width 84 in this region, the significant association to width by these two markers is likely due to the contribution of the gene(s) to overall seed size and not seed width, per se. A total of five QTL for length to width ratio were detected and confirmed to be significant in both populations and four of them were newly detected in this present study (Table 4.3). These loci are located on chromosomes 4, 9, 12, 13, 14; those on chromosome 4 were close to a region of previously published QTL for length to width ratio (Salas et al., 2006) (Table 4.1). Significant markers at 25.7 cM on chromosome 9 and 95.9 cM on chromosome 10 were significantly associated with length to width ratio and were in LD in the initial population (Figure 4.2). Only the marker on chromosome 9 was confirmed to be significantly associated with length to width ratio in the confirmation population. Given that soybean lines within a breeding population can be in admixture (Lam et al., 2010; Chapter 3), it is likely that the inter-chromosomal LD patterns differs in two different populations. The QTL for protein and oil on chromosome 20 (I) has been previously widely detected and was verified in the initial mapping population (Chung et al., 2003; Nichols et al., 2006). However, the markers associated with this chromosome 20 QTL for protein were not assayed in the confirmation population (Figure 4.3). In this study, LD decays to the critical r2 of smaller than 0.1 (Figure 4.1), the genetic distances is approximately 1cM, which is a more rapid decay than previously reported at > 2.5 cM (Zhu et al., 2003). Thus, an increased marker density might be required to further capture all functional sites. Though association mapping can effectively detect 85 phenotypic associations with common alleles, it is less efficient in detecting rare alleles (Yu and Buckler, 2006). As such, even with complete marker saturation, association mapping in this population will not likely detect all large effect QTL and there is a need to investigate traits using larger association mapping panels or both bi-parental as well as association mapping panels. However, the markers-trait associations which have been detected in both the initial mapping population and the confirmation population are expected to be robust and directly applicable for use in marker assisted selection. Credits: Genotyping was conducted by MCIC. Seed protein and oil measurements were conducted by NCAUR for the initial mapping population and by Mr. Scott McIntyre for the confirmation population. All other work was conducted by Mao Huang with assistance from members of the McHale lab for seed size and shape measurements. Dr. David Francis and Dr. Steve St. Martin assisted with statistical data analysis. 86 TABLES AND FIGURES Breeding line or cv. ‘Dennison’‡ HS0-3243‡ ‘IA3024’‡ ‘OHS 202’‡ OHS 307 ‘Prohio’‡ ‘Streeter’‡ M10-A010 M10-A012 M10-A021 M10-A033 M10-A034 M10-A054 M10-A078 M10-B018 M10-B025 M10-B031 M10-B032 M10-B034 M10-B035 M10-B036 M10-B037 M10-B054 M10-B082 M10-B090 M10-C001 M10-C003 M10-C005 M10-C006 M10-C007 M10-C011 M10-C012 M10-C016 M10-C063 M10-C069 M10-C072 M10-C073 M10-C077 M10-C081 M10-D011 M10-D017 M10-D025 Pedigree Reference† St. Martin et al., 2008 St. Martin et al., 2006 Iowa State Univ. OSU-OARDC Mian et al., 2008 OSU-OARDC Dennison x HS4-2973 Dennison x HS4-2973 LG00-3372 x Wyandot LG00-3372 x Wyandot LG00-3372 x Wyandot Dennison x HS3-2669 LD00-3309 x HS4-2973 Dennison x HS3-2669 Dennison x HS3-2669 Dennison x HS3-2669 Dennison x HS3-2669 Dennison x HS3-2669 Dennison x HS3-2669 Dennison x HS3-2669 Dennison x HS3-2669 OHS 202 x HS4-9864 HS4-2973 x HS4-9864 HS4-2973 x HS4-9864 HS4-2973 x HS4-9864 HS4-2973 x HS4-9864 HS4-2973 x HS4-9864 HS4-2973 x HS4-9864 HS4-2973 x HS4-9864 HS4-2973 x HS4-9864 HS4-2973 x HS4-9864 HS4-2973 x HS4-9864 HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS4-2915 x HS3-2669 HS4-2915 x HS3-2669 HS4-2915 x HS3-2669 Continued Table 4.1. Confirmation population breeding lines and their pedigrees as well as lines used as checks. † Lines that have been released as a cultivar have references listed. ‡ Checks cultivars and breeding lines which were grown in multiple trials, but not used as part of the independent confirmation population. 87 Table 4.1 continued M10-D028 M10-D049 M10-D050 M10-D054 M10-D061 M10-D064 M10-D065 M10-D067 M10-D071 M10-D073 M10-D074 M10-D075 M10-D076 M10-D078 M10-D079 M10-D080 M10-D081 M10-D084 M10-D085 M10-D086 M10-E003 M10-E030 M10-E032 M10-E033 M10-E034 M10-E035 M10-E036 M10-E037 M10-E038 M10-E059 M10-E061 M10-E062 M10-E063 M10-E064 M10-E065 M10-E066 M10-E069 M10-F002 M10-F003 M10-F009 M10-F043 M10-F048 M10-F050 M10-F061 M10-F078 M10-F081 M10-F083 M10-F087 M10-F089 M10-W056 M10-W059 M10-W081 M10-W083 M10-W100 M10-W102 Wyandot x HS3-2669 HS2-4225 x HS4-2973 HS2-4225 x HS4-2973 HS2-4225 x HS4-2973 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 HS3-2669 x HS4-9232 OHS 303 x HS4-9232 OHS 303 x HS4-9232 OHS 303 x HS4-9232 OHS 303 x HS4-9232 OHS 303 x HS4-9232 OHS 303 x HS4-9232 OHS 303 x HS4-9232 OHS 303 x HS4-9232 IA2065 x OHS 303 IA2065 x OHS 303 IA2065 x OHS 303 IA2065 x OHS 303 IA2065 x OHS 303 IA2065 x OHS 303 IA2065 x OHS 303 IA2065 x OHS 303 HS1-36612 x N98-4445A HS1-36612 x N98-4445A HS1-36612 x N98-4445A Dennison x HS5-1089-11 Dennison x HS5-1089-11 Dennison x HS5-1089-11 Dennison x HS5-1089-14 Dennison x HS3-2669 IA2065 x OHS 303 HS4-2973 x HS4-9864 LD00-3309 x HS4-2973 HS3-2523 x Dennison U01-390489 x HS4-9908 U01-390489 x HS4-9908 Dennison x HS4-9864 Dennison x HS4-9864 HS4-9864 x HS3-2669 HS4-9864 x HS3-2669 Continued 88 Table 4.1 continued M10-W106 M10-W107 M10-W108 M10-W109 M10-W111 M10-W114 M10-W115 M10-W116 M10-W117 M10-W118 M10-W119 M10-W120 M10-W121 M10-W125 M10-W127 M10-W130 M10-W166 M10-W168 M10-W169 M10-W170 M10-W171 M10-W174 M10-W175 M10-W185 M10-W225 M10-W226 M10-W227 M10-W228 M10-W232 M10-W237 M10-W241 M10-W242 M10-W244 M10-W268 M10-W269 M10-W295 M10-W297 M10-W299 M10-W303 M10-W312 M10-W314 M10-W335 M10-W336 M10-W337 M10-W344 M10-W345 M10-W346 M10-W348 M10-W354 M10-W357 M10-W369 M10-W370 M10-W371 M10-W372 M10-W373 HS4-9864 x HS3-2669 HS4-9864 x HS3-2669 HS4-9864 x HS3-2669 HS4-9864 x HS3-2669 HS4-9864 x HS3-2669 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS4-5450 x OHS 303 HS3-2669 x Dennison HS3-2669 x Dennison HS3-2669 x Dennison HS3-2669 x Dennison HS3-2669 x Dennison HS3-2669 x Dennison HS3-2669 x Dennison HS3-2669 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison HS3-2523 x Dennison Dennison x HS4-5426 Dennison x HS4-5426 IA2065 x HS3-2669 IA2065 x HS3-2669 IA2065 x HS3-2669 IA2065 x HS3-2669 IA2065 x HS3-2669 IA2065 x HS3-2669 HS5-1134-21 x HS3-2669 HS5-1134-21 x HS3-2669 HS5-1134-21 x HS3-2669 HS5-1134-21 x HS3-2669 HS5-1134-21 x HS3-2669 HS5-1134-21 x HS3-2669 HS5-1134-21 x HS3-2669 HF03-534 x IA3024 HF03-534 x IA3024 OHS 202 x HS5-1112-44 OHS 202 x HS5-1112-44 OHS 202 x HS5-1112-44 OHS 202 x HS5-1112-44 OHS 202 x HS5-1112-44 89 Trait Protein Oil Weight Volume Density Length Width Length:width Initial population σ2G/ σ2P 0.96 0.93 0.90 0.81 0.09 0.62 0.53 0.22 Confirmation population σ2G/ σ2P 0.83 0.89 0.85 0.78 0.15 0.56 0.36 0.42 Table 4.2. Proportion of the observed phenotypic variance (σ2p) explained by genetic variance for BLUP values of seed traits in the initial and confirmation populations. 90 Chr. (LG) PositionConsensus 4.0 map (cM) PositionComposite Genetic Marker 2003 map region † (cM) 2(D1b) 116.5 116.3- 121.0 D1b-1 BARC-028373-05856 3(N) 61.3 75.3-78.9 N-1 BARC-028205-05791 3(N) 61.5 75.3 N-1 BARC-016485-02069 4(C1) 15.6 28.4 C1-1 BARC-016519-02081 4(C1) 19.5 30.7 C1-1 BARC-031733-07217 4(C1) 20.3 30.7 C1-1 BARC-021219-04011 4(C1) 27.7 42.9- 44.5 C1-1 BARC-014361-01331 4(C1) 31.1 27.7 C1-1 Gmax7x194_694393 4(C1) 43.5 59.9- 67.0 C1-2 BARC-024445-04886 4(C1) 63.9 C1-3 BARC-044523-08716 5(A1) 2.5 0-2.0 A1-1 BARC-040651-07808 5(A1) 7(M) 7(M) 30.9 24.5 68.7 31.1-31.4 A1-2 23.4- 33.2 M-1 71.7-77.23 M-2 BARC-053559-11912 BARC-054347-12492 BARC-047995-10452 9(K) 25.7 10(O) 10(O) 10(O) 29.4 54.4 82.5 40.7-44.6 24.6-33.2 51.9-57.0 85.7 10(O) 95.9 100.4 87.3 K-1 Gmax7x85_2848025 O-1 O-2 O-3 BARC-065789-19751 BARC-022175-04293 Gmax7x141_1711038 O-3 Gmax7x300_135386 Trait density length volume weight length width volume weight length width volume weight length width volume weight length width volume weight length weight length width volume weight length length: width density length oil density length length: width density oil protein length: width IP‡ MLM (p-value) 2.90E-05 3.78E-08 8.48E-04 7.87E-06 1.86E-12 2.00E-07 4.97E-08 2.44E-08 3.93E-10 2.72E-05 1.96E-06 3.94E-08 3.56E-09 1.52E-04 7.57E-06 3.31E-06 3.56E-09 1.52E-04 7.57E-06 3.31E-06 9.90E-06 3.35E-04 6.62E-09 3.17E-04 2.90E-05 5.86E-06 2.00E-04 QTL CP§ positions¶ R2 for IP ANOVA Composite MLM (p2003 map value) (cM) 0.084 n.d. None 0.132 n.a. 84.6-102 0.057 n.a. None 0.093 n.a. 78.5-84.5 0.200 n.d. 84.6-102 0.119 n.d. None 0.131 n.d. None 0.136 n.d. 78.51-84.51 0.164 0.0006 21.0-33.3 0.083 0.0015 None 0.104 0.077 10.3-65.1 0.132 0.042 17.6-19.6 0.149 n.a. 21.0-33.3 0.070 n.a. None 0.094 n.a. 10.3-65.1 0.100 n.a. 32.3-34.3 0.149 0.0288 21.0-33.3 0.070 0.0253 None 0.094 0.01 10.3-65.1 0.100 0.005 32.3-34.3 0.091 n.a. 21.0-33.3 0.064 n.a. 32.3-34.3 0.144 n.d. 21.0-33.3 0.064 n.d. None 0.083 n.d. 10.3-65.1 0.095 n.d. 32.3-34.3 0.068 10.3-65.1 2.12E-04 0.068 2.25E-04 3.86E-04 2.05E-04 2.53E-06 2.48E-05 0.068 0.063 0.069 0.103 0.084 2.30E-04 0.055 7.20E-06 5.71E-04 6.48E-06 0.095 0.061 0.081 3.35E-05 0.082 Reference Present study Salas et al., 2006 Present study Chen et al., 2007 Salas et al., 2006 Present study Present study Chen et al., 2007 Salas et al., 2006 Present study Salas et al., 2006 Mian et al., 1996b Salas et al., 2006 Present study Salas et al., 2006 Orf et al., 1999a Salas et al., 2006 Present study Salas et al., 2006 Orf et al., 1999a Salas et al., 2006 Orf et al., 1999a Salas et al., 2006 Present study Salas et al., 2006 Orf et al., 1999a Salas et al., 2006 0.0024 90.72-122.62 Salas et al., 2006 n.a. n.a. n.d. n.a. n.a. None None 29.3-31.3 None 62.3-71.7 0.0032 None n.a. n.a. n.d. None 49.7-54.2 58.4-106 0.1287 None Present study Present study Mansur et al., 1996 Present study Salas et al., 2006 Present study Present study Panthee et al., 2005 Chen et al., 2007 Present study Continued Table 4.3. Significant marker-trait associations. † Position may be estimated according to neighboring markers. ‡ IP, initial population. Bold values are significantly associated with markers () with limitation of the familywise error rate to 0.1% using the Benjamini-Holm method. §CP, confirmation population. Bold values are significantly associated with phenotypes in the confirmation population according to single marker ANOVA (α = 0.05). ¶Previously published QTL. n.a not assayed in confirmation population n.d monomorphic or low minor allele frequency 91 Table 4.3 continued 12(H) 26.9 12(H) 62 12(H) 62.1 12(H) 101.1 27.6-28.8- H-1 BARC-016807-02334 H-2 BARC-018973-03046 62.6-67.8 H-2 BARC-061985-17608 108.2 H-3 BARC-039237-07479 62.6 13(F) 64.1 77.7- 82.8 F-2 BARC-061189-17109 14(B2) 12.6 14.7 B2-1 BARC-061557-17270 16(J) 16(J) 8.2 25.4 3.81- 11.74 J-1 22.1 J-1 BARC-063377-18348 BARC-020505-04644 17(D2) 33.4 34.1- 39.3 D2-1 BARC-054249-12398 18(G) 70.6 72.8 G-1 BARC-024489-04936 length: width length: width length: width protein length: width density length: width density length protein density width 8.70E-06 0.092 5.54E-07 0.113 8.05E-05 0.076 1.67E-04 0.070 9.89E-05 0.074 5.03E-07 0.115 2.30E-04 0.055 1.45E-08 1.38E-04 3.58E-04 6.13E-07 5.95E-04 0.141 0.071 0.064 0.113 0.059 0.8598 None n.a. None Present study 0.0029 None Present study n.a. 123-125 0.0008 None 0.67 Present study Present study n.a. n.a. n.a. n.a. n.a. None None None None None 27.0-39.0 20(I) 19 22.9-37.6 I-1 0.116 n.a. Gmax7x61_2757538 30.6-34.7 31.4-33.4 34.2-36.2 36.0-49.3 36.4-36.9 37.1-39.1 21.0-23.0 31.4-33.4 31.4-33.4 protein 8.30E-12 0.191 n.a. 31.4-33.4 35.9-37.9 35.9-37.9 36.4-36.9 37.1-39.1 92 Present study None 21.8-23.8 4.14E-07 Qiu et al., 1999 0.0025 None 21.0-23.0 oil Present study Present study Present study Present study Present study Present study Reinprect et al., 2006 Csanadi et al., 2001 Reinprect et al., 2006 Qi et al., 2011 Sebolt et al., 2000 Specht et al., 2001 Qi et al., 2011 Chung et al., 2003 Diers et al., 1992 Reinprect et al., 2006 Sebolt et al., 2000 Brummer et al., 1997 Diers et al., 1992 Tajuddin et al., 2003 Tajuddin et al., 2003 Chung et al., 2003 Diers et al., 1992 Figure 4.1 LD decay for the initial mapping population. 93 Figure 4.2. Manhattan plots of the MLM result for marker associations with seed traits. The dashed lines indicate a p-value threshold of 0.001. Markers are in genetic order (cM) across the x-axis; vertical bars separate chromosomes. 94 Figure 4.3. Display of LD selected chromosomes. Markers are organized in genetic order; pairwise LD, the squared correlation coefficient (r2) is shown by the level of shading in each diamond. Black diamonds are loci between which r2 = 1; grey diamonds indicate 0 < r2 < 1; white diamonds indicate r2 = 0. Red lines highlight all unlinked marker pairs with r2 > 0.1 and a significant association to a trait in the initial mapping population. Image was generated with Haploview 4.2 (Barrett et al., 2005). 95 REFERENCES Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 54(2): 263-265. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 57: 289-300. Bradbury PJ, Zhang ZZ, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 23: 2633- 2635. Breseghello F, Sorrells ME (2006) Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics. 172(2): 1165-1177. Brummer EC, Graef GL, Orf JH, Wilcox JR, Shoemaker RC (1997) Mapping QTL for seed protein and oil content in eight soybean populations. Crop Science 37(2): 370378. Cai TD, Chang KC, Shih MC, Hou HJ, Ji M (1997) Comparison of bench and production scale methods for making soymilk and tofu from 13 soybean varieties. Food Research International. 30(9): 659-668. Chen Q, Zhang Z, Liu C, Xin D, Qiu H, Shan D, Shan C, Hu G (2007) QTL Analysis of Major Agronomic Traits in Soybean. Agriculture Scienc in China. 6(4): 399 -405. Chung J, Babka HL, Graef GL, Staswick PE, Lee DJ, Cregan PB, Shoemaker RC, Specht JE (2003) The seed protein, oil, and yield QTL on soybean linkage group I. Crop Science. 43: 1053–1067. Clemente TE, Choon E (2009) Soybean oil: genetic approaches for modification of functionality and total content. Plant Physiology. 151: 1030-1040. Csanadi G, Vollmann J, Stift G, Lelley T (2001) Seed quality QTLs identified in a molecular map of early maturing soybean. Theoretical and Applied Genetics. 103(67): 912-919. Diers BW, Keim P, Fehr WR, Shoemaker RC (1992) RFLP analysis of soybean seed protein and oil content. Theoretical and Applied Genetics. 83: 608-612. Hoeck JA, Fehr WR, Shoemaker RC, Welke GA, Johnson SL, Cianzio SR (2003) Molecular marker analysis of seed size in soybean. 43: 68-74. 96 Hyten DL, Pantalone VR, Sams CE, Saxton AM, Landau-Ellis D, Stefaniak TR, Schmidt ME (2004) Seed quality QTL in a prominent soybean population. Theoretical and Applied Genetics. 109: 552–561. Hyten DL, Song Q, Choi I-Y, Yoon M-S, Specht JE, Matukumalli LK, Nelson RL, Shoemaker RC, Young ND, Cregan PB (2008) High-throughput genotyping with the GoldenGate assay in the complex genome of soybean. Theoretical and Applied Genetics. 116(7): 945-952. Hyten DL, Choi I-Y, Song Q, Specht JE, Carter TE, Shoemaker RC, Hwang E-Y, Matukumallif LK, Cregan PB (2010) A high density integrated genetic linkage map of soybean and the development of a 1536 universal soy linkage panel for quantitative trait locus mapping. Crop Science. 50: 960-968. Lam HM, Xu X, Liu X, Chen WB, Yang GH et al. (2010) Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. 42(12): 1053-1059. Liang HZ, Yu YL, Wang SF, Lian Y, Wang TF, Wei YL, Gong PT, Liu XY, Fang XJ, Zhang MC (2010) QTL Mapping of isoflavone, oil and protein contents in soybean (Glycine max L. Merr.). Agricultural Sciences in China. 9: 1108-1116. Liu W, Gowda M, Steinhoff J, Maurer HP, Würschum T, Longin CFH, Cossic F, Reif JC (2011) Association mapping in an elite maize breeding population. Theoretical and Applied Genetics. 123(5): 847-858. Massman J, Cooper B, Horsley R, Neate S, Dill-Macky R, Chao S, Dong Y, Schwarz P, Muehlbauer GJ, Smith KP (2011) Genome-wide association mapping of Fusarium head blight resistance in contemporary barley breeding germplasm. Molecular Breeding. 27(4): 439-454. Mansur LM, Orf JH, Chase K, Jarvik T, Cregan PB, Lark KG (1996) Genetic mapping of agronomic traits using recombinant inbred lines of soybean. Crop Science. 36(5): 1327-1336. Mian MAR, Bailey MA, Tamulonis JP, Shipe ER, Carter TE Jr, Parrott WA, Ashley DA, Hussey RS, Boerma HR (1996) Molecular markers associated with seed weight in two soybean populations. Theoretical and Applied Genetics. 93(7): 1011-1016. Mian MAR, Cooper RL, Dorrance AE (2008) Registration of “Prohio” soybean. Journal of Plant Registrations. 2: 208-210. Nichols DM, Glover KD, Carlson SR, Specht JE, Diers BW (2006) Fine mapping of a seed protein QTL on soybean linkage group I and its correlated effects on agronomic traits. Crop science. 46(2): 834-839. 97 Nyquist WE (1991) Estimation of heritability and prediction of selection response in plant populations. Critical Reviews in Plant Sciences. 10(3): 235-322 Orf JH, Chase K, Jarvik T, Mansur LM, Cregan PB, Adler FR, Lark KG (1999) Crop Science. 39(6): 1642-1651. Panthee DR, Pantalone VR, West DR, Saxton AM, Sams CE (2005) Quantitative trait loci for seed protein and oil concentration, and seed size in soybean. Crop Science. 45(5): 2015-2022. Poysa V, Woodrow L (2002) Stability of soybean seed composition and its effect on soymilk and tofu yield and quality. Food Research International. 35: 337-345. Poysa V, Woodrow L, Yu K (2006) Effect of soy protein subunit composition on tofu quality. Food Research International. 39(3): 309-317. Qi ZM, Wu Q, Han X, Sun YN, Du XY, Liu CY, HW Jiang, Hu GH, Chen QS (2011) Soybean oil content QTL mapping and integrating with meta-analysis method for mining genes. Euphytica 179: 499-514. Qiu BX, Arelli PR, Sleper DA (1999) RFLP markers associated with soybean cyst nematode resistance and seed composition in a ‘Peking’בEssex’ population. Theoretical and Applied Genetics. 98(3): 356-364. Reinprecht Y, Poysa VW, Yu K, Rajcan I, Ablett GR, Pauls KP (2006) Seed and agronomic QTL in low linolenic acid, lipoxygenase-free soybean (Glycine max (L.) Merrill) germplasm. Genome. 49(12): 1510-1527. Salas P, Oyarzo-Llaipen JC, Wang D, Chase K, Mansur (2006) Genetic mapping of seed shape in three populations of recombinant inbred lines of soybean (Glycine max L. Merr.). Theoretical and Applied Genetics. 113: 1459-1466. Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics. 78: 629–644. Sebolt AM, Shoemaker RC, Diers BW (2000) Analysis of a quantitative trait locus allele from wild soybean that increases seed protein concentration in soybean. Crop Science 40(5): 1438-1444. Shi A, Chen P, Zhang B, Hou A (2010) Genetic diversity and association analysis of protein and oil content in food-grade soybeans from Asia and the United States. Plant Breeding. 129: 250-256. Soybase. Map QTL. http://www.soybase.org. Reviewed November 6, 2012. 98 Specht JE, Chase K, Macrander M, Graef GL, Chung J, Markwell JP, Germann M, Orf JH, Lark KG (2001) Soybean Response to Water:A QTL Analysis of Drought Tolerance. Crop Science 41(2): 493-509. Stephens M, Donnelly P (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics 73(5): 1162–1169. St. Martin SK, Feller MK, Fioritto MJ, McIntyre SA, Dorrance AE, Berry SA, Sneller CH (2006) Registration of 'HS0-3243' Soybean. Crop Science 46: 1811. St. Martin SK, Feller MK, McIntyre SA, Fioritto RJ, Dorrance AE, Berry SA, Sneller CH (2008) Registration of ‘Dennison’ Soybean. Journal of Plant Registrations 2: 21. Tajuddin T, Watanabe S, Yamanaka N, Harada K (2003) Analysis of quantitative trait loci for protein and lipid contents in soybean seeds using recombinant inbred lines. Breeding Science 53(2): 133-140. Teng W, Han Y, Du Y, Sun D, Zhang Z, Qiu L, Sun G, Li W (2008) QTL analyses of seed weight during the development of soybean (Glycine max L. Merr.). Heredity. 102: 372-380. Wang J, McClean PE, Lee R, Goos RJ, Helms T (2008) Association mapping of iron deficiency chlorosis loci in soybean (Glycine max L. Merr.) advanced breeding lines. Theoretical and Applied Genetics. 116(6): 777-787. Ye S, Dhillon S, Ke X, Collins AR, Day INM (2001) An efficient procedure for genotyping single nucleotide polymorphisms. Nucleic Acid Research. 29(17): E88-8. Yung-Tsi B, Bindu J, Steven BC, Michelle AG, Diers BW et al. (2010) Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean. BMC Plant Biology. 10: 41-64. Zhang B, Chen PY, Florez-Palacios SL, Shi A, Hou A, Ishibashi T (2010) Seed quality attributes of food-grade soybeans from the U.S. and Asia. Euphytica. 173:387-396. Zhu CS, Gore M, Buckler ES, Yu J (2008) Status and prospects of association mapping in plants. Plant Genome. 1: 5-20. Zhu YL, Song QJ, Hyten DL, Van Tassell CP, Matukumalli LK et al. (2002) Genetics 163: 1123-1134. 99 BIBLIOGRAPHY American Public Health Association. Restricting trans fatty acids in the food supply. http://www.apha.org/advocacy/policy/policysearch/default.htm?id=1366. Retrieved 11-03-2012. American Soybean Association. Soy Stats 2012. http://www.soystats.com/2012. Retrieved November 6, 2012. Aranzana MJ, Kim S, Zhao K, Bakker E, Horton M, Jakob K, Lister C, Molitor J, Shindo C, Tang C, Toomajian C, Traw B, Zheng H, Bergelson J, Dean C, Marjoram P, Nordborg M (2005) Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes. PLoS Genetics. 1: 531-539. Aziadekey M, Schapaugh WT, Herald TJ (2002) Genotype by environment interaction for soymilk and tofu quality characteristics. Journal of Food Quality. 25: 243-259. Bachlava E, Dewey RE, Burton JW, Cardinal AJ (2009) Mapping and comparison of quantitative trait loci for oleic acid seed content in two segregating soybean populations. Crop Science. 49(2): 433-442. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 54(2): 263-265. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 57: 289-300. Bernard RL, Lindahl DA (1972) Registration of Williams Soybean (Reg. No. 94). Crop Science. 12: 716. Bradbury PJ, Zhang ZZ, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 23: 2633- 2635. Breseghello F, Sorrells ME (2006) Association mapping of kernel size and milling quality in wheat (Triticum aestivum L.) cultivars. Genetics. 172: 1165-1177. 100 Brummer EC, Graef GL, Orf JH, Wilcox JR, Shoemaker RC (1997) Mapping QTL for seed protein and oil content in eight soybean populations. Crop Science 37(2): 370378. Bolon Y-T, Joseph B, Cannon SB, Graham MA, Diers BW, Farmer AD, May GD, Muehlbauer GJ, Specht JE, Tu ZJ, Weeks N, Xu WW, Shoemaker RC, Vance CP (2010) Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean. BMC Plant Biology. 10: 41-64. Burton JW, Wilson RF, Novitzky W, Carter TE (2004) Registration of ‘Soyola’ soybean. Crop Science. 44(2): 687-688. Burton JW, Wilson RF, Rebetzke GJ, Pantalone VR (2006) Registration of N98–4445A mid-oleic soybean germplasm line. Crop Science. 46(2): 1010-1012. Buzzell RI, Anderson TR, Hamill AS, Welacky TW (1991) Harovinton soybean. Canadian Journal of Plant Science. 71: 525-526. Cai TD, Chang KC (1998) Characteristics of production-scale tofu as affected by soymilk coagulation method: propeller blade size, mixing time and coagulant concentration. Food Research International. 31(4): 289-295. Cai TD, Chang KC (1999) Processing effect on soybean storage proteins and their relationship with tofu quality. Journal of Agricultural and Food Chemistry. 47(2): 720-727. Cai TD, Chang KC, Shih MC, Hou HJ, Ji M (1997) Comparison of bench and production scale methods for making soymilk and tofu from 13 soybean varieties. Food Research International. 30(9): 659-668. Cardon LR, Palmer LJ (2003) Population stratification and spurious allelic association. The Lancet. 361: 598-604. Chen Q, Zhang Z, Liu C, Xin D, Qiu H, Shan D, Shan C, Hu G (2007) QTL Analysis of Major Agronomic Traits in Soybean. Agriculture Scienc in China. 6(4): 399 -405. Cheng YJ, Thompson LD, Brittin HC (1990) Sogurt, a yogurt-like soybean product development and properties. Journal of Food Science. 55: 1178-1179. Chianu JN, Zegeye EW, Nkonya E M (2010) Global Soybean Marketing and Trade: a Situation and Outlook Analysis. In: The Soybean: Botany, Production and Uses. G. Singh., Ed. CAB International: Wallingford, England. 101 Choi I-Y, Hyten DL, Matukumalli LK, Song Q, Chaky JM, Quigley CV, Chase K, Lark KG, Reiter RS, Yoon M-S, Hwang E-Y, Yi S-I, Young ND, Shoemaker RC, van Tassell CP, Specht JE, Cregan PB (2007) A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis. Genetics. 176: 685–696. Chung J, Babka HL, Graef GL, Staswick PE, Lee GJ, Cregand PB, Shoemaker RC, Specht JE (2003) The seed protein, oil, and yield QTL on soybean linkage group I. Crop Science. 43: 1053–1067. Clemente TE, Cahoon EB (2009) Soybean oil: genetic approaches for modification of functionality and total content. Plant Physiology. 151: 1030-1040. Cober ER, Voldeng HD, Fregeau-Reid JA (1997) Heritability of Seed Shape and Seed Size in Soybean. Crop Science. 37: 1767-1769. Cober ER, Fregeau-Reid JA, Butler G, Voldeng HD (2006) Genotype–Environment analysis of parameters describing water uptake in natto soybean. Crop Science. 46: 2415-2419. Cooper RL, Hammond RB (1999) Registration of Insect-Resistant Soybean Germplasm Lines HC95-24MB and HC95-15MB. Crop Science. 39: 599. Cregan PB, Jarvik T, Bush AL, Shoemaker RC, Lark KG, Kahler AL, Kaya N, VanToai TT, Lohnes DG, Chung J, Specht JE (1999) An integrated genetic linkage map of the soybean genome. Crop Science. 39: 1464-1490. Csanadi G, Vollmann J, Stift G, Lelley T (2001) Seed quality QTLs identified in a molecular map of early maturing soybean. Theoretical and Applied Genetics. 103(67): 912-919. Daultry S (1976) Principal Components Analysis. Geo Abstracts Limited: East Anglia, Norwich. Diers BW, Keim P, Fehr WR, Shoemaker RC (1992) RFLP analysis of soybean seed protein and oil content. Theoretical and Applied Genetics. 83: 608-612. Diers BW, Cary TR, Thomas DJ, Nickell CD (2006) Registration of ‘LD00-3309’ soybean. Crop Science. 46:1384. Dimitri C, Greene C (2002) Recent growth patterns in the US organic foods market. Agriculture Information Bulletin. 777. Evans DE, Tsukamoto C, Nielson NC (1997) A small scale method for the production of soymilk and silken tofu. Crop Science. 37: 1463-1471. 102 Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology. 14: 26112620. Fehr WR, Bahrenfus JB, Walker AK (1984) Registration of Vinton 81 Soybean. Crop Science. 24(2): 384. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 164: 1567-1587. Gandhi AP, Bourne MC (1988) Effect of pressure and storage time on texture profile parameters of soybean curd (tofu). Journal of Texture Studies. 19: 137-142. Garris AJ, Tai TH, Coburn J, Kresovich S, McCouch S (2005) Genetic structure and diversity in Oryza sativa L. Genetics. 169:1631-1638. Gizlice Z, Carter TE, and Burton JW (1994) Genetic base for North American public soybean cultivars released between 1947 and 1988. Crop Science. 34:1143-1151 Glaszmann JC, Kilian B, Upadhyaya HD, Varshney RK (2010) Accessing genetic diversity for crop improvement. Current Opinion in Plant Biology. 13(2): 167-173. Golbitz P, Jordan J (2006) Soyfoods: Market and Products. In: Soy Applications in Food. Riaz, M. N., Ed. Taylor & Francis: Boca Raton, FL. Graef GL, Specht JE (1989) Fitting the niche food grade soybean production: a new opportunity for Nebraska soybean producers. Nebraska Department of Agriculture, Lincoln, pp 18–27. Griffis G, Wiedermann L (1990) Marketing food-quality soybeans in Japan, 3rd edn. American Soybean Association, St. Louis. Grundy SM, Florentin L, Nix D, Whelan MF (1988) Comparison of monounsaturated fatty acids and carbohydrates for reducing raised levels of plasma cholesterol in man. The American Journal of Clinical Nutrition. 47: 965-969. Han YP, Xie DX, Teng WL, Zhang SH, Chang W, Li WB (2011) Dynamic QTL analysis of linolenic acid content in different developmental stages of soybean seed. Theoretical and Applied Genetics. 122: 1481-1488. Hirata TH, Abe J, Shimamoto Y (1999) Genetic structure of the Japanese soybean population. Genetic Resources and Crop Evolution. 46: 441-453. 103 Hoeck JA, Fehr WR, Shoemaker RC, Welke GA, Johnson SL, Cianzio SR (2003) Molecular marker analysis of seed size in soybean. Crop Science. 43(1): 68-74. Hong K-J, Lee C-H, Kim SW (2004) Aspergillus oryzae GB-107 fermentation improves nutritional quality of food soybeans and feed soybean meals. Journal of Medicinal Food. 7(4): 430-435. Hou HJ, Chang KC, Shih MC (1997) Yield and textural properties of soft tofu as affected by coagulation method. Journal of Food Science. 62(4): 824-827. Hyten DL, Pantalone VR, Sams CE, Saxton AM, Landau-Ellis D, Stefaniak TR, Schmidt ME (2004) Seed quality QTL in a prominent soybean population. Theoretical and Applied Genetics. 109: 552–561. Hyten DL, Song Q, Choi I-Y, Yoon M-S, Specht JE, Matukumalli LK, Nelson RL, Shoemaker RC, Young ND, Cregan PB (2008) High-throughput genotyping with the GoldenGate assay in the complex genome of soybean. Theoretical and Applied Genetics. 116(7): 945-952. Hyten DL, Choi I-Y, Song Q, Specht JE, Carter TE, Shoemaker RC, Hwang E-Y, Matukumallif LK, Cregan PB (2010) A high density integrated genetic linkage map of soybean and the development of a 1536 universal soy linkage panel for quantitative trait locus mapping. Crop Science. 50: 960-968. Johnson HW, Bernard RL (1962) Soybean genetics and breeding. Advances in Agronomy. 14: 149-221. Johnson LD, Wilson LA (1984) Influence of soybean variety and the method of processing in tofu manufacturing: comparison of methods for measuring soluble solids in soymilk. Journal of Food Science. 49(1): 202-204. Jones H, Civáň P, Cockram J, Leigh FJ, Smith LM, Jones MK, Charles MP, MolinaCano J-L, Powell W, Jones G, Brown TA (2011) Evolutionary history of barley cultivation in Europe revealed by genetic analysis of extant landraces. Evolutionary Biology. 11: 320-331. Jun T-H. Van K, Kim MY, Lee SH, Walker DR (2008) Association analysis using SSR markers to find QTL for seed protein content in soybean. Euphytica. 162(2): 179-191. Keim P, Olson TC, Shoemaker RC (1988) A rapid protocol for isolating soybean DNA. Soybean Genetics Newsletter. 15: 150-152. Kim KS, Diers BW, Hyten DL, Mian MAR, Shannon JG, Nelson RL (2012) Identification of positive yield QTL alleles from exotic soybean germplasm in two backcross populations. Theoretical and Applied Genetics. 125: 1353-1369. 104 Krishnan HB (2005) Engineering soybean for enhanced sulfur amino acid content. Crop Science. 45: 454-461. Kumar V, Rani A, Solanki S, Hussain SM (2006) Influence of growing environment on the biochemical composition and physical characteristics of soybean seed. Journal of Food Composition and Analysis. 19(2): 188-195. Kuroda Y, Kaga A, Tomooka N, Vaughan DA (2006) Population genetic structure of Japanese wild soybean (Glycine soja) based on microsatellite variation. Molecular Ecology. 15: 959-974. Kwan SW, Easa AM (2003) Comparing physical properties of retort-resistant glucono- δlactone tofu treated with commercial transglutaminase enzyme or low levels of glucose. LWT-Food Science and Technology. 36(6): 643-646. Lam HM, Xu X, Liu X, Chen WB, Yang GH et al. (2010) Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. 42(12): 1053-1059. Lange CE, Federizzi LC (2009) Estimation of soybean genetic progress in the South of Brazil using multi-environmental yield trials. Science Agricola. 66: 309–316. Lee JD, Bilyeu KD, Shannon JG (2007) Genetics and breeding for modified fatty acid profile in soybean seed oil. Journal of Crop Science and Biotechnology. 10(4): 201210. Liang HZ, Yu YL, Wang S-F, Lian Y, T-F Wang, Wei Y-L, Gong P-T, Liu X-Y, Fang X-J, Zhang M-C (2010) QTL Mapping of isoflavone, oil and protein contents in soybean (Glycine max L. Merr.). Agricultural Sciences in China. 9: 1108-1116. Lim BT, DeMan JM, DeMan L, Buzzel RI (1990) Yield and quality of tofu as affected by soybean and soymilk characteristics, calcium sulfate coagulant. Journal of Food Science. 55(4): 1088-1107. Liu KJ, Goodman M, Muse S, Smith JS, Buckler E Doebley J (2003) Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics. 165: 2117-2128. Liu KS (1997) Soybeans: chemistry, technology, and utilization. New York: Chapman & Hall. Liu W, Gowda M, Steinhoff J, Maurer HP, Würschum T, Longin CFH, Cossic F, Reif JC (2011) Association mapping in an elite maize breeding population. Theoretical and Applied Genetics. 123(5): 847-858. 105 Liu ZS, Chang SKC (2004) Effect of soy milk characteristics and cooking conditions on coagulant requirements for making filled tofu. Journal of Agricultural and Food Chemistry. 52(11): 3405-3411. Mansur LM, Orf JH, Chase K, Jarvik T, Cregan PB, Lark KG (1996) Genetic mapping of agronomic traits using recombinant inbred lines of soybean. Crop Science. 36(5): 1327-1336. Mardia KV, Kent JT, Bibby JM (1979) Multivariate Analysis. Academic Press: London. Massman J, Cooper B, Horsley R, Neate S, Dill-Macky R, Chao S, Dong Y, Schwarz P, Muehlbauer GJ, Smith KP (2011) Genome-wide association mapping of Fusarium head blight resistance in contemporary barley breeding germplasm. Molecular Breeding. 27(4): 439-454. McHale LK, Feller MK, McIntyre SA, Berry SA, St. Martin SK, Dorrance AE (2012) Registration of 'Summit', a high-yielding soybean with race-specific resistance to Phytophthora sojae. Journal of Plant Registrations. doi: 10.3198/jpr2012.01.0012crc. Mian MAR, Bailey MA, Tamulonis JP, Shipe ER, Carter TE Jr, Parrott WA, Ashley DA, Hussey RS, Boerma HR (1996) Molecular markers associated with seed weight in two soybean populations. Theoretical and Applied Genetics. 93(7): 1011-1016. Mian MAR, Cooper RL, Dorrance AE (2008) Registration of ‘Prohio’ soybean. Journal of Plant Registrations. 2: 208-210. Miller JF, Zimmerman, Vick BA (1987) Genetic control of high oleic acid content in sunflower oil. Crop Science. 27: 923-926. Min S, Yu Y, Martin SS (2005) Effect of soybean varieties and growing locations on the physical and chemical properties of soymilk and tofu. Journal of Food Science. 70(1): C8-C21. Mujoo R, Trinh DT, Ng PK (2003) Characterization of storage proteins in different soybean varieties and their relationship to tofu yield and texture. Food Chemistry. 82(2): 265-273. Mullin WJ, Fregeau-Reid JA, Butler M, Poysa V, Woodrow L, Jessop DB, Raymond D (2001) An interlaboratory test of a procedure to assess soybean quality for soymilk and tofu production. Food Research International. 34: 669-677. Mullin WJ, Xu W (2001) Study of soybean seed coat components and their relationship to water absorption. Journal of Agricultural and Food Chemistry. 49(11): 5331-5335. 106 Mullin WJ, Fregeau-Reid JA, Butler M, Poysa V, Woodrow L, Jessop DB, Raymond D (2001) An interlaboratory test of a procedure to assess soybean quality for soymilk and tofu production. Food Research International. 34(8): 669-677. National agricultural Statistics Service. Quick Stats. http://quickstats.nass.usda.gov. Retrieved September 7, 2011. Nichols DM, Glover KD, Carlson SR, Specht JE, Diers BW (2006) Fine mapping of a seed protein QTL on soybean linkage group I and its correlated effects on agronomic traits. Crop Science. 46(2): 834-839. Nyquist WE (1991) Estimation of heritability and prediction of selection response in plant populations. Critical Reviews in Plant Sciences. 10(3): 235-322 Ohio Soybean Council. International Marketing. http://associationdatabase.com/aws/OHSOY/pt/sp/osc_home. Retrieved September 7, 2011. Orf JH, Chase K, Jarvik T, Mansur LM, Cregan PB, Adler FR, Lark KG (1999) Crop Science. 39(6): 1642-1651. Panthee DR, Pantalone VR, West DR, Saxton AM, Sams CE (2005) Quantitative trait loci for seed protein and oil concentration, and seed size in soybean. Crop Science. 45(5): 2015-2022. Panthee DR, Pantalone VR, Saxton A (2006) Modifier QTL for fatty acid composition in soybean oil. Euphytica. 152(1): 67-73. Poysa V, Woodrow L (2002) Stability of soybean seed composition and its effect on soymilk and tofu yield and quality. Food Research International. 35: 337-345. Poysa V, Woodrow L, Yu K (2006) Effect of soy protein subunit composition on tofu quality. Food Research International. 39: 309-317. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics. 155: 945-959. Qi ZM, Wu Q, Han X, Sun YN, Du XY, Liu CY, HW Jiang, Hu GH, Chen QS (2011) Soybean oil content QTL mapping and integrating with meta-analysis method for mining genes. Euphytica 179: 499-514. Qiu BX, Arelli PR, Sleper DA (1999) RFLP markers associated with soybean cyst nematode resistance and seed composition in a ‘Peking’בEssex’ population. Theoretical and Applied Genetics. 98(3): 356-364. 107 Rao MSS, Mullinix BG, Rangappa M, Cebert E, Bhagsari AS, Sapra VT, Joshi M, Dadson RB (2002) Genotype × environment interactions and yield stability of foodgrade soybean genotypes. Agronomy Journal. 94(1): 72-80. Reinprecht Y, Poysa VW, Yu K, Rajcan I, Ablett GR, Pauls KP (2006) Seed and agronomic QTL in low linolenic acid, lipoxygenase-free soybean (Glycine max (L.) Merrill) germplasm. Genome. 49(12): 1510-1527. Salas P, Oyarzo-Llaipen JC, Wang D, Chase K, Mansur L (2006) Genetic mapping of seed shape in three populations of recombinant inbred lines of soybean (Glycine max L. Merr.). Theoretical and Applied Genetics. 113(8): 1459-1466. Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics. 78: 629–644. Sebolt AM, Shoemaker RC, Diers BW (2000) Analysis of a quantitative trait locus allele from wild soybean that increases seed protein concentration in soybean. Crop Science 40(5): 1438-1444. Shannon JG, Sleper DA, Arelli DR, Burton JW, Wilson RF, Anand SC (2005) Registration of S01-9269 Soybean Germplasm Line Resistant to Soybean Cyst Nematode with Seed Oil Low in Saturates. Crop Science. 45(4): 1673-1674. Shen CF, De Man L, Buzzell RI, De Man JM (1991) Yield and Quality of tofu as affected by soybean and soymilk characteristics: Glucono-delta-lactone coagulant. Journal of Food Science. 56(1): 109-112. Shi A, Chen P, Zhang B, Hou A (2010) Genetic diversity and association analysis of protein and oil content in food-grade soybeans from Asia and the United States. Plant Breeding. 129(3): 250-256. Shih MC, Hou HJ, Chang KC (1997) Process optimization for soft tofu. Journal of Food Science. 62(4):833-837. Shih MC, Yang KT, Kuo SJ (2002) Quality and antioxidative activity of black soybean tofu as affected by bean cultivar. 67(2): 480-484. Simko I, Pechenick DA, McHale LK, Truco MJ, Ochoa OE, Michelmore RW Scheffler B E (2009) Association mapping and marker-assisted selection of the lettuce dieback resistance gene Tvr1. BMC Plant Biology. 9(1): 135. Sim SC, Robbins MD, Deynze AV, Michel AP, Francis DM (2011) Population structure and genetic differentiation associated with breeding history and selection in tomato (Solanum lycopersicum L.). Heredity. 106: 927-935. 108 Song QJ, Marek LF, Shoemaker RC, Lark KG, Concibido VC, Delannay X, Specht JE, Cregan P B (2004) A new integrated genetic linkage map of the soybean. TAG Theoretical and Applied Genetics. 109(1): 122-128. Soybase. Map QTL. http://www.soybase.org. Reviewed November 6, 2012. Specht JE, Chase K, Macrander M, Graef GL, Chung J, Markwell JP, Germann M, Orf JH, Lark KG (2001) Soybean Response to Water:A QTL Analysis of Drought Tolerance. Crop Science 41(2): 493-509. Stephens M, Donnelly P (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics 73(5): 1162–1169. St. Martin SK, Calip-DuBois AJ, Fioritto RJ, Schmitthenner AF, Min DB, Yang T-S, Yu YM, Cooper RL, Martin RJ (1996) Registration of ‘Ohio FG1’ Soybean. Crop Science. 26: 813. St. Martin SK, Feller MK, Fioritto MJ, McIntyre SA, Dorrance AE, Berry SA, Sneller CH (2006) Registration of ‘HS0–3243’ Soybean. Crop Science. 46:1811. St. Martin SK, Mills GR, Fioritto RJ, McIntyre SA, Dorrance AE, Berry SA (2006) Registration of ‘Ohio FG5’Soybean. Crop science. 46(6): 2709-2709. St. Martin SK, Mills GR, Fioritto RJ, McIntyre SA, Dorrance AE, Cooper RL (2004) Registration of ‘Ohio FG3’ soybean. Crop Science. 44: 687. St. Martin SK, Feller MK, McIntyre SA, Fioritto RJ, Dorrance AE, Berry SA, Sneller CH (2008) Registration of ‘Dennison’ Soybean. Journal of Plant Registrations 2: 21. Sun N, Breene WM (1991) Calcium sulfate concentration influence on yield and quality of tofu from five soybean varieties. Journal of Food Science. 56(6): 1604-1607. Tajuddin T, Watanabe S, Yamanaka N, Harada K (2003) Analysis of quantitative trait loci for protein and lipid contents in soybean seeds using recombinant inbred lines. Breeding Science 53(2): 133-140. Teng W, Han Y, Du Y, Sun D, Zhang Z, Qiu L, Sun G, Li W (2008) QTL analyses of seed weight during the development of soybean (Glycine max L. Merr.). Heredity. 102(4): 372-380. Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, Flint-Garcia S, Buckler ES (2011) Genome-wide association study of leaf architecture in the maize nested association mapping population. Nature Genetics. 43(2): 159-162. 109 Van Inghelandt D, Melchinger AE, Lebreton C, Stich B (2010) Population structure and genetic diversity in a commercial maize breeding program assessed with SSR and SNP markers. Theoretical and Applied Genetics. 120: 1289-1299. Wang CCR, Chang SKC (1995) Physiochemical properties and tofu quality of soybean cultivar Proto. Journal of Agricultural Food Chemistry. 43: 3029-3034. Wang LF, Ge H, Hao C, Dong Y, Zhang X (2012) Identifying loci influencing 1,000kernel weight in wheat by microsatellite screening for evidence of selection during breeding. PLoS ONE. 7(2): e29432. Wang HL, Swain EW, Kwolek WF, Fehr WR (1983) Effect of soybean varieties on the yield and quality of tofu. Cereal Chemistry. 60(3): 245-248. Wang HY, Smith KP, Combs E, Blake T, Horsley RD, Muehlbauer GJ (2012) Effect of population size and unbalanced data sets on QTL detection using genome-wide association mapping in barley breeding germplasm. Theoretical and Applied Genetics. 124: 111-124. Wang J, McClean PE, Lee R, Goos RJ, Helms T (2008) Association mapping of iron deficiency chlorosis loci in soybean (Glycine max L. Merr.) advanced breeding lines. Theoretical and Applied Genetics. 116(6): 777-787. Wei Q, Chang SKC, Characteristics of fermented natto products as affected by soybean cultivars. Journal of Food Processing Preservation. 28: 251-273. Xu Y, Li HN, Li GJ, Wang X, Cheng LG, Zhang YM (2011) Mapping quantitative trait loci for seed size traits in soybean (Glycine max L. Merr.). Theoretical and Applied Genetics. 122(3): 581-594. Yaklich RW, Vinyard B, Camp M, Douglass S (2002) Analysis of seed protein and oil from soybean northern and southern region uniform tests. Crop Science. 42(5): 15041515. Yamaura, K (2011) Market power of the Japanese non-GM soybean import market: The US exporters vs. Japanese importers. Asian Journal of Agriculture and Rural Development. 1(2): 80-89. Yan J, Wen-Ju Z, Da-Xu F, Bao-Rong L (2003) Sampling strategy within a wild soybean population based on its genetic variation detected by ISSR markers. Acta Botanica Sinica. 45(8): 995-1002. Yan WG, Li Y, Agrama HA, Luo D, Gao F, Lu X, Ren G (2009) Association mapping of stigma and spikelet characteristics in rice (Oryza sativa L.). Molecular Breeding. 24(3): 277-292. 110 Yasir SBM, Sutton KH, Newberry MP, Andrews NR, Gerrard JA (2007) The impacts of transglutaminase on soy proteins and tofu texture. Food Chemistry. 104: 1491-1501. Ye S, Dhillon S, Ke X, Collins AR, Day INM (2001) An efficient procedure for genotyping single nucleotide polymorphisms. Nucleic Acid Research. 29(17): E88-8. Yuan S, Chang SKC (2007) Texture profile of tofu as affected by Instron parameters and sample preparation, and correlations of Instron hardness and springiness with sensory scores. Journal of Food Science. 72(2): S136-S145. Yu JM, Buckler ES (2006) Genetic association mapping and genome organization of maize. Current Opinion in Biotechnology. 17: 155-160. Yung-Tsi B, Bindu J, Steven BC, Michelle AG, Diers BW et al. (2010) Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean. BMC Plant Biology. 10: 41-64. Zhang B, Chen P, Florez-Palacios SL, Shi A, Hou A, Ishibashi T (2010) Seed quality attributes of food-grade soybeans from the US and Asia. Euphytica. 173(3): 387-396. Zhang DD, Bai GH, Zhu CS, Yu JM, and Carver BF (2010) Genetic diversity, population structure, and linkage disequilibrium in U.S. elite winter wheat. The Plant Genome. 3(2): 117-127. Zhu CS, Gore M, Buckler ES, Yu J (2008) Status and prospects of association mapping in plants. The Plant Genome. 1(1): 5-20. Zhu YL, Song QJ, Hyten DL, Van Tassell CP, Matukumalli LK et al. (2002) Genetics 163: 1123-1134. 111