International Biometric Society SELECTION OF SUGARCANE FAMILIES USING SYNTHETIC DATA AND ARTIFICIAL NEURAL NETWORK Luiz A. Peternelli1,2, Ethel F.O. Peternelli1, Édimo F.A. Moreira1, Moysés Nascimento1 1 Department of Statistics, Federal University of Vicosa, Vicosa, MG, Brazil. 2 Sugarcane breeding program, RIDESA-UFV, Brazil. Sugarcane is recognized worldwide as an important alternative source of fuel. In addition to increased demand for sugar, the higher demand for ethanol leads to a great need to increase the area planted with sugarcane, or smarter and environmentally friendly way, increasing their productivity. The programs of genetic improvement of this crop worldwide have invested in ways to speed up the identification, selection, and therefore the release of new varieties increasingly productive. In this sense, a large number of genotypes should be evaluated in field experiments. The selection process preferably occurs in two stages: first the breeder identifies best families (usually half-sib or full-sib families) in the experiment. Subsequently, the breeder seeks, within these top families, promising individuals or clones, which are passed on to the next stages of breeding programs. These clones will be evaluated in controlled experiments. For families to be evaluated with respect to yield, it is necessary to weigh the plot corresponding to each family. If the family has a higher average than the overall mean of the experiment, it is considered a promising family. Alternatively, the yield may be obtained as a function of other characters called yield components, which are easily collected on the plot level: number of stalks (NS), average diameter of stalks (DS) and average height of stalks (HS). This study aimed to evaluate the use of synthetic data and artificial neural networks (ANN) as a way to facilitate the identification of the best families without weighing all the material in the field. We evaluated real data from five family selection experiments (totaling 110 families), provided by the sugarcane breeding program at the Federal University of Viçosa, MG - Brazil. From each plot we collected the mass of stalks (MS, lately converted to tons of stalks per hectare, TSH), and the yield components NS, DS and HS. From the actual TSH we classified each family the "selected" or "not selected". NS, DS and HS were used as the input variables. A single hidden layer backpropagation ANN was considered in the analyses. We Evaluated two main scenarios: (i) data from one experiment was used as a training population while the rest was used as a testing population, and (ii) the same as i, but with the training population augmented by synthetic data obtained by simulation from the mean vector and covariance matrix corresponding to variables TSH, NS, DS and HS of the original training set. Other derived sub-scenarios were also considered. Comparisons were based on the percentage of misclassification obtained from each analysis. Results shown that selection of families based only on NS, DS and HS under an ANN approach after considering synthetic data seems to be potentiality useful and could be used as an alternative to ease family selection procedures under field conditions, mainly if a large number of families are to be considered. (FAPEMIG, CAPES, CNPq) International Biometric Conference, Florence, ITALY, 6 – 11 July 2014