Non-Normal Data in Agricultural Experiments

advertisement
Sunday – April 28th
7:30 – 8:30 am
Registration and Check–in for Workshop Participants,
Pre-convene Continental Breakfast in Foyer ABCD
Workshop 8:30 am – 5:00pm
ABC-McDowell/Tuttle/Alcove
Applied Statistics in Agriculture Short Course
Statistical Graphics in Agriculture
Kevin Wright – Research Scientist at DuPont Pioneer
This course will illustrate the use of statistical graphics for agricultural
data. We will start with understanding perception of the basic building blocks
for graphics and how people perceive those elements. With that background, we
will cover specific graphical techniques for simple data and move on to more
complex genotype-by-environment interactions that include biplots, partial
least squares, and stability measures. We will look at data for field
experiments and consider aspects of data quality and graphics for visualizing
the results of mixed models. We will touch briefly on semi-graphical
techniques, dynamic graphics, and a gallery of graphics. Finally, we will
consider how to get from a basic graphic to a polished product ready for
presentation or publication. Graphics in R will be discussed briefly along
the way.
Break: 10:00 – 10:15 am
Lunch: Noon – 1:00 pm in Big Basin Ballroom D
Break: 2:30 – 2:45 pm
Please note: Break times are approximate
Notes
1
Monday – April 29th
8:00 –10:00 am
8:30 – 8:45 am
Registration for Conference Participants,
Pre-Convene Continental Breakfast in Foyer ABCD
Welcome
ABC-McDowell/Tuttle/Alcove
 Session #1A, 8:45 -9:15 am
Non-Normal Data in Agricultural Experiments
Walt Stroup – University of Nebraska-Lincoln
ABC-McDowell/Tuttle/Alcove
Once there were two ways to deal with non-normal data from designed
experiments. The first assumed the robustness of the Central Limit Theorem:
ANOVA tests means; sample means are approximately normal even if the data
aren’t; if the experiment is well-designed, all will be well. Some derided this as the
“maybe if we don’t acknowledge it, it won’t really exist” approach. The other
approach was to transform the data. Arguably there was a third way –
nonparametric methods – but nonparametrics are not well-suited to complex
experiments and thus their use in agriculture has been limited. Advances in
computers and modeling over the past couple of decades have greatly expanded
our options. In theory, we can apply generalized and mixed models to experiments
of near arbitrary complexity with data from a wide variety of distributions. With
expanded options come dilemmas. We have software choices – R, SAS among many
others. Models have conditional and marginal formulations. There are GLMMs,
GEEs among a host of other acronyms. There are different estimation methods –
linearization (e.g. pseudo-likelihood), integral approximation (e.g. quadrature) and
Bayesian methods. How do we decide what to use? How much, if anything, do we
lose if we ignore the new and trendy stuff and revert to transformations? I am
tempted to call this talk “When does the CLT CYA and when is it a TLA that fails to
CYA?” In 2011, I introduced a design-to-model thought process I called WWFD
(What Would Fisher Do – inspired by Fisher’s comments in a 1935 JRSS publication).
In this talk, I’ll show how we can use this process to clarify our thinking about
probability processes we conceptualize as giving rise to data in designed
experiments and how we can use the results to help us understand what the
various options for non-normal data actually do, how to evaluate their small-sample
behavior, and how to make informed choices based on these insights.
2
 Session #1B, 9:15-9:45 am
ABC-McDowell/Tuttle/Alcove
On the Small Sample Behavior of Generalized Linear Mixed Models with Complex
Experiments
J. Couton and W. W. Stroup – University of Nebraska-Lincoln
Generalized linear mixed models (GLMMs), regardless of the software used to
implement them (R, SAS, etc.), can be formulated as conditional or marginal models
and can be computed using pseudo-likelihood, penalized quasi-likelihood, or
integral approximation methods. While information exists about the small sample
behavior of GLMMs for some cases- notably RCBDs with Binomial or count datalittle is known about GLMMs for continuous proportions (e.g. Beta), time-to-event
(e.g. Gamma) data, or for more complex designs such as the split-plot. In this
presentation we review the major model formulation and estimation options and
compare their small sample performance for cases listed above.
Session #1C, 9:45 -10:15 am
ABC-McDowell/Tuttle/Alcove
Estimation of Dose Requirements for extreme levels of efficacy
Mark West and Guy Hallman – USDA Agricultural Research Service
The objective of this paper is to explore the extent of how dose-response models
may be used to estimate extreme levels of efficacy for controlling insect pests and
possibly other uses. Probit-9 mortality (99.9968% mortality) is a standard for
treatment effectiveness in tephritid fruit fly research, and has been adopted by the
United States Department of Agriculture for fruit flies and other pests. Data taken
from the phytosanitary treatment (PT) literature are analyzed. These data are used
to fit dose-response models with logit, probit and complimentary log-log links. The
effectiveness of these models for predicting extreme levels of efficacy is compared
using large (~100,000+ individuals) confirmatory trials that are also reported in the
PT literature. We examine the role of model goodness-of-fit as a requirement for
obtaining reliable dose requirements.
10:15 am
Break & Poster Session
3
Big Basin Ballroom D
Session #2, KEYNOTE ADDRESS, 10:45 – 12:00 pm ABC-McDowell/Tuttle/Alcove
Issues in Statistical and Graphical Literacy
Kevin Wright – Research Scientist at DuPont Pioneer
One of the defining attributes of statistics is the study of variability. Yet the results
of statistical analyses often are presented as point estimates, sometimes without
context and without conveying the variability, and without answering the specific
question.
Examples:
+ Does salt contribute to high blood pressure? Have you *seen* the result, or only
*heard* the result?
+ Are you troubled to hear that exposure to a carcinogen doubles your risk of
cancer? What if this risk was visualized in comparison to other risks?
+ HIV tests report positive/negative. How accurate is this test? How can this
accuracy be *visualized* for consumers?
+ What can be learned from a single p-value? What can be learned by looking at
the distribution of p-values? How does this relate to the number of published
papers that are not reproducible?
+ Monthly changes in the unemployment rate are often explained. How often is the
explanation: "no change within the bounds of sampling error"? Would a graphic like
Sparklines help?
+ Forensic experts can provide a probability that two DNA samples come from the
same person. But the real question is different: "What is the probability of *a
particular sample* of DNA matching a defendant.” This is a different question and
necessitates consideration of factors such as the probability of laboratory error.
12:00 pm
Lunch
Big Basin Ballroom D
Session #3A, 1:30 – 2:00 pm
ABC-McDowell/Tuttle/Alcove
Five things I wish my Mother had told me, about Statistics that is
Philip Dixon – Iowa State University
This talk is a collection of data analysis stories that illustrate some general points
that I wish I had learnt a lot earlier in my career. These include: 1) Simpson’s
paradox is everywhere. 2) A numerical optimization routine may think it has
converged, but be aware that it might not have really converged. 3) You can’t
always trust the Satterthwaite approximation. Be especially careful if you need to
estimate a linear combination of variance components that has one or more
negative coefficients. 4) BLUP’s are wonderful things. 5) It’s good to know Reverend
Bayes. He can be a great help in many problems.
4
Session #3B, 2:00 – 2:30 pm
ABC-McDowell/Tuttle/Alcove
Thou shall not brush your teeth while eating breakfast – a 7-step program for
researchers previously hurt in data analysis
Edzard van Santen – Auburn University
After years of providing statistical advice to fellow faculty members and graduate
students I have come to realize that it is not necessarily the big issues but lack of
knowledge of basic data analysis principles that get my clients into trouble. My
claim is that if researchers and students internalized two basic definitions they
would not have any problems analyzing most of their experiments. The definitions
of Experimental Unit (EU) as the smallest physical unit to which a treatment may be
applied and Experimental Error (Exp. Err.) as the variation among EUs treated alike
are the basis for successful data analysis of experiments. I follow a seven-step data
analysis program for my graduate student and faculty clients: (1) Understanding the
experiment; (2) Checking the data; (3) Getting a feel for the data; (4) Checking
underlying assumptions; (5) Testing; (6) Estimating; and (7) Interpreting results.
Clients who have adhered to the program generally have had fewer problems than
clients who for some reason or another did not get on board of the program. I will
also touch on the implications for teaching experimental design and data analysis to
non-statistics majors.
Session #3C, 2:30 – 3:00 pm
ABC-McDowell/Tuttle/Alcove
Use and Misuse of Multiple Comparisons Procedures of Means in Factorial
Experiments
Siraj Omer – Agricultural Research Corporation
Multiple comparison procedures of means are frequently misused and such misuse
may result in incorrect scientific conclusions. A review of the papers published in
the Sudan Journal of Agricultural Research (SJAR) from 2005 to 2010 showed that
in 150 papers some procedures was used for mean comparison in case of factorial
experiments. The objectives of this study was to identify the most common errors
made in the use of multiple comparison procedures of means in factorial
experiments and to present correct methods. In 30 % and 20 of these papers an
incorrect use of Pair-Wise test and multiple comparison test (MCT) was made and
only 20 % could be considered entirely correct. Misuses of MCP were in comparison
of levels of a quantitative factor, comparison of treatment means in a factorial
arrangement and planned contrasts. In some cases, totally incorrect Duncan
multiple range test were made. In conclusion, there is need for statistical reasoning
in the future for evolving appropriate and convenience multiple comparison of
means in factorial experiments for qualitative and quantitative level and in that
way to appraise the right statistical differences. Enough statistical analysis for
experimental design its results can be judged for validity and may serve as a basis
for the design of future experiments.
5
3:00 pm
Break & Poster Session
Big Basin Ballroom D
Session #4A, 3:30 – 4:00 pm
ABC-McDowell/Tuttle/Alcove
Characterizing Benthic Macroinvertebrate Community Responses to Nutrient
Addition Using NMDS and BACI Analyses
Bahman Shafii and William Price – University of Idaho
Wayne Minshall – Idaho State University
Charlie Holderman – Kootenai Tribe of Idaho
Paul Anders – Cramer Fish Sciences
Gary Lester and Pat Barrett – EcoAnalysts, Inc.
Nonmetric multidimensional scaling (NMDS) is an ordination technique which is
often used for information visualization and exploring similarities or dissimilarities
in ecological data. In principle, NMDS maximizes rank-order correlation between
distance measures and distance in the ordination space. Ordination points are
adjusted in a manner that minimizes stress, where stress is defined as a measure of
the discordance between the two kinds of distances. Before and After Control
Impact (BACI) is a classical analysis of variance method for measuring the potential
influence of an environmental disturbance. Such effects can be assessed by
comparing conditions before and after a planned activity. In certain ecological
applications, the extent of the impact is also expressed relative to conditions in a
control area, after a particular anthropogenic activity has occurred. In this paper,
two statistical techniques are employed to investigate the effect of stream nutrient
addition on a benthic macroinvertebrate community. The clustering of sampling
units, based on multiple macroinvertebrate metrics across pre-determined river
zones, is explored using NMDS. BACI is subsequently used to test for the potential
impact of nutrient addition on the specified macroinvertebrate response metrics.
The combination of the two approaches provides a powerful and sensitive tool for
detecting complex second-order effects in the river food chains. Statistical
techniques are demonstrated using eight years of benthic macroinvertebrate survey
data collected on an ultra-oligotrophic reach of the Kootenai River in Northern
Idaho and Western Montana downstream from a hydro-electric dam.
Session #4B, 4:00 – 4:30 pm
ABC-McDowell/Tuttle/Alcove
Fitting population models when detection is imperfect
Trevor Hefley, Andrew Tyre and Erin Blankenship – University of Nebraska-Lincoln
Population time series data from field studies are complex and statistical analysis
requires models that describe nonlinear population dynamics and observational
errors. State-space formulations of stochastic population growth models have been
used to account for measurement error caused by the data collection process.
6
Parameter estimation, inference, and prediction are all sensitive to measurement
error assumptions. In particular, the observational process may also result in
incomplete detection of individuals. We developed an N-mixture state-space
modeling framework to estimate and correct for errors in detection while
estimating population model parameters. We tested our methods using simulated
data sets and compared the results to those obtained with state-space models
when detection is perfect and when detection is ignored. Our N-mixture statespace model yielded parameter estimates of similar quality to a state-space model
when detection is perfect. Our results show that ignoring detection errors in
population time series analysis can lead to disastrously wrong estimates. We
recommend that researchers consider the possibility of detection errors when
analyzing population time series data.
5:00 – 7:00 pm Flint Hills Discovery Center, 25th Annual Conference Celebration
8:30 – 10:30 pm Kansas Country Dance at the Hilton Garden Inn and Convention
Center in Big Basin Ballroom D
Tuesday – April 30th
Continental Breakfast is Available in Foyer ABCD
Session #5A, 8:30-9:00 am
ABC-McDowell/Tuttle/Alcove
Accounting for heterogeneous pleiotropy in whole genome selection models
N. M. Bello – Kansas State University;
J. P. Steibel and R. J. Tempelman – Michigan State University
The additive genetic correlation between economically relevant traits is generally
considered a critical factor determining the relative advantage of multi-trait models
over single-trait models for whole genome prediction of genetic merit. Yet, the
additive genetic correlation between traits may be considered an aggregate
summary of between-trait correlations at the individual QTL level, thereby defining
pleiotropic mechanisms by which individual genes have simultaneous effects on
multiple phenotypic traits. Pleiotropic effects, in turn, may be gene specific and
heterogeneous across the genome. In this study, we present a hierarchical Bayesian
extension to bivariate genomic prediction models that accounts for heterogeneous
pleiotropic effects across SNP markers. More specifically, we elicit a function of the
SNP marker-specific correlation between traits as heterogeneous across markers
following a square-root Cholesky reparameterization of the marker-specific
covariance matrix that ensures necessary positive semidefinite constraints. We use
simulation studies to demonstrate the properties of the proposed methods. We
assess the relative performance of the proposed method by comparing prediction
accuracy for genomic breeding values and for SNP marker effects for each of two
traits across putative scenarios of homogeneous and heterogeneous pleiotropic
7
genetic mechanisms. We also consider extensive model comparisons for cases of
null and non-null additive genetic correlations under conditions of high and low
heritability of the traits of interest. Overall, the relative advantage of genomic
prediction bivariate models that account for heterogeneous pleiotropy relative to
their univariate counterparts was of small magnitude and seemed to depend upon
trait heritability and genetic architecture of the pleiotropic mechanisms. The tradeoff between methodological and computational modeling complexity and net gain
in prediction accuracy is also discussed.
Session #5B, 9:00 – 9:30 am
ABC-McDowell/Tuttle/Alcove
Comparing Functional Data Analysis and Hysteresis Loops when Testing
Treatments for Reducing Heat Stress in Dairy Cows
Spencer Maynes, A.M. Parkhurst, T. L. Mader – University of Nebraska-Lincoln
J. B. Gaughan – The University of Queensland, Gatton, Australia
Average yearly monetary losses due to heat stress in dairy cattle have been
estimated at $897 million in the US alone. Various techniques are commonly used
to reduce heat stress, including sprayers and misters, shading, and changes in feed.
Oftentimes studies are performed where researchers do not control the times
when animals use shading or other means available to reduce heat stress, making it
hard to test differences between treatments. Two methods are used on data from a
study where Holstein cows were given free access to weight activated “cow
showers.” Functional data analysis, or FDA, can be used to model body temperature
as a function of time and environmental variables such as the Heat Load Index.
Differences between treatment groups can be tested using functional analysis of
variance. Alternatively hysteresis loops, such as the ellipse, formed by a plot of air
temperature or the Heat Load Index against body temperature over the course of a
day can be estimated and their parameters used to test differences between cows
with access to showers and cows without. An R package developed at UNL,
hysteresis that can estimate these loops and their parameters is shown. Functional
data analysis allows for looser assumptions regarding the body temperature curve
and the ability to look for differences between groups at specific time points, while
hysteresis loops give the ability to look at heat stress over the course of a day
holistically in terms of parameters such as amplitude, lag, central values and area.
8
Session #5C, 9:30 – 10:00 am
ABC-McDowell/Tuttle/Alcove
Using Functional Data Analysis to Evaluate Effect of Shade on Body Temperature
of Feedlot Heifers During Environmental Heat Stress
Fan Yan, A. M. Parkhurst – University of Nebraska-Lincoln
C. N. Lee – University of Hawaii-Manoa
Heat stress can be a serious problem for cattle. Body temperature (Tb) is a good
measure of an animal’s thermo-regulatory response to an environmental thermal
challenge. Previous studies performed in controlled chambers found that Tb
increases in response to increasing ambient temperature. However, when animals
are in an uncontrolled environment, Tb is subject to many uncontrolled
environmental factors, such as radiation, wind, humidity, etc., that increase
variation in the data. Hence, functional data analysis (FDA) was applied to model Tb
as curves over two weeks (from July 27 to Aug 5) for animals exposed to
uncontrolled environmental factors. Breed (Angus, MARC-III, MARC-I, Charolais)
and availability of shade (access versus no access to sun shade) were incorporated
as treatment factors in the statistical model. This study illustrates the potential of
FDA to retain all information in the curves. The specific objectives of this study are
to use FDA to smooth Tb with large variation, to detect treatment effects on Tb,
and to assess the interactions between breed and availability of shade with
functional regression coefficients. The results show that FDA can be used to detect
significant treatment interactions that may otherwise remain undetected using
regular linear or nonlinear models. The significant interactions indicates that access
to sun shade influences the way animals respond to a thermal challenge. Overall, it
was found that breeds of cattle with dark-hides were more affected by temperature
changes and peak temperatures than breeds of cattle with light-hides. Angus cattle
(black) had the highest body temperatures in both shade and no shade areas, while
Charolais (white) had the lowest body temperatures in the no shade area. However,
the interaction showed MARC III (dark red) experienced the largest temperature
differential between shade and no shade. Therefore, breed and availability of shade
interactions are important considerations when making predictions to aid in
management decisions involving feedlot cattle.
10:00 – 11:00 am
Break & Poster Session
9
Big Basin Ballroom D
Session #6A, 11:00 – 11:30 am
ABC-McDowell/Tuttle/Alcove
Statistical Methods for Identifying Gene Expression Heterosis
Dan Nettleton, Peng Liu and Jarad Niemi – Iowa State University
Tieming Ji – University of Missouri
Heng Wang – Michigan State University
Heterosis, also known as hybrid vigor, occurs when the mean trait value of offspring
is more extreme than that of either parent. Well before heterosis was first
scientifically described by Darwin in 1876, humans had been using heterosis for
various practical purposes. Within the last century, heterosis has been used to
improve many crop species for food, feed, and fuel industries. Despite intensive
study and successful utilization of heterosis, the basic molecular genetic
mechanisms responsible for heterosis remain unclear. In an effort to better
understand the underlying mechanisms, researchers have begun to measure the
expression levels of thousands of genes in parental lines and their hybrid offspring.
The expression level of each gene can be viewed as a trait alongside more
traditional traits like plant height, grain yield, or drought tolerance. This talk will
describe statistical methods that can be used to search for genes that exhibit
expression heterosis. We will briefly discuss the challenges of testing for heterosis
with data from a single trait and then shift to simultaneous analysis for thousands
of gene expression traits. Hierarchical modeling strategies for dealing with both
continuous gene expression data from microarrays and count-based expression
data from next-generation sequencing of RNA will be presented.
Session #6B, 11:30 – 12:00 noon
ABC-McDowell/Tuttle/Alcove
Detecting Factors Associated with Yield Stability
Jixiang Wu, Karl Glover and William Berzonsky – South Dakota State University
Traditional yield stability analyses are focused on yield itself such as single
regression based method and additive main effect and multiplicative interaction
analysis. It is likely that yield stability of a genotype is associated with many factors
such as fertilizer level, soil types, weather conditions, and/or yield components may
be associated with yield performance. Detection of factors highly associated with
yield stability will help breeders develop cultivars adapted to different
environments or specific environments. In this study, we proposed a new method, a
multiple linear regression method, to detect factors associated with yield stability. A
resampling method, bootstrapping approach, was integrated into this new method
to detect the significance of each parameter of interest. For demonstration, a data
set with 22 spring wheat genotypes evaluated in 18 environments in South Dakota
from 2009-2011 will be analyzed and reported in this presentation.
12:00 pm
Lunch
Big Basin Ballroom D
10
Session #7A, 1:30 – 2:00 pm
ABC-McDowell/Tuttle/Alcove
Construction of disease risk scoring systems using logistic group lasso: application
to porcine reproductive and respiratory syndrome survey data
Hui Lin, Chong Wang, Peng Liu and Derald Holtkampb – Iowa State University
We propose to utilize the group lasso algorithm for logistic regression to construct a
risk scoring system for predicting disease in swine. This work is motivated by the
need to develop a risk scoring system from survey data on risk factor for porcine
reproductive and respiratory syndrome (PRRS), which is a major health, production
and financial problem for swine producers in nearly every country. Group lasso
provides an attractive solution to this research question because of its ability to
achieve group variable selection and stabilize parameter estimates at the same
time. We propose to choose the penalty parameter for group lasso through leaveone-out cross-validation, using the criterion of the area under the receiver
operating characteristic curve. Survey data for 896 swine breeding herd sites in the
USA and Canada completed between March 2005 and March 2009 are used to
construct the risk scoring system for predicting PRRS outbreaks in swine. We show
that our scoring system for PRRS significantly improves the current scoring system
that is based on an expert opinion. We also show that our proposed scoring system
is superior in terms of area under the curve to that developed using multiple logistic
regression model selected based on variable significance.
Session #7B, 2:00 – 2:30 pm
ABC-McDowell/Tuttle/Alcove
Additive-Dominance Model with Sub-block Effect: An Extended Model to Improve
Plant Breeding Data Analysis
Krishna Bondalapati, Jixiang Wu and Karl Glover – South Dakota State University
Experiments in plant breeding trials often involve testing a large number of
genotypes arranged in a rectangular plot with rows and columns. Field variation
may have a significant impact on prediction of genetic effects when many
genotypes are evaluated in a large field. Lattice-based models are widely used to
control field variation. Examples of such methods include but not limited to: rowcolumn α-designs, block designs with nested rows and columns etc. In these
designs, heterogeneity is eliminated in two directions (rows and columns) within
each block. Field block shape, however, may not always be rectangular due to space
limitations. In this study, we address this problem by including either single or
multiple rows or columns as sub-blocks with no specific assumption on the number
of genotypes tested or block size. Without losing our focus, an additive-dominance
(AD) model was extended and applied to a spring wheat breeding dataset to
demonstrate the use of sub-block based genetic models. Three agronomic traits
(grain yield, plant height and time-to-flowering) were analyzed using the extended
AD model. Results based on simulation showed that these data can be effectively
analyzed using extended AD model. Actual data analysis revealed that grain yield
11
and plant height were significantly influenced by the field variation. Additive effects
were significant for grain yield and plant height, and dominance effects were
significant for plant height. Except the environmental variation, other sources of
variation were insignificant for time-to-flowering. Most spring wheat lines
developed by South Dakota State University breeding program (SD lines) exhibited
good general combining ability effects for yield improvement and for reducing plant
height. This study addressed a flexible method to improve genetic data analysis by
reducing field variation.
Session #7C, 2:30 – 3:00 pm
ABC-McDowell/Tuttle/Alcove
Restricted Latent Class Multiple Imputation Method of Categorical Missing Data
Qiao Ma – University of Nebraska-Lincoln
Multiple imputations are a commonly used method to deal with incomplete data
sets and are used by researchers on many different analytical levels. Imputation
substitutes missing data with some values instead of discarding the entire case from
the analysis. While dealing with large data sets with more than a few incomplete
categorical variables, it is not possible to apply log-linear modeling due to
limitations of sparseness. It is because we are not able to set up and process the full
multi-way cross-tabulation required for the log-linear analysis. The latent class
model is a plausible multiple imputation tool to solve this problem (Vermunt 2008).
Another possible solution of a limited number of categorical variables associated
with the log-linear method is to use hot-deck imputation (Rubin 1987). In this study,
several multiple imputation methods for large categorical datasets will be tested.
An advanced restricted latent class model-based multiple imputation method is
proposed to be a better, more representative approach than the unrestricted latent
class model since it specifies equality and inequality constraints on sums of
conditional response probabilities.
Session #7D, 3:00-3:30 pm
ABC-McDowell/Tuttle/Alcove
Construction of measure of second order slope rotatable designs using balanced
incomplete block designs
B. Re. Victorbabu and Ch. V.V.S. Surekha – Acharya Nagarjuna University
Hader and Park (1978) introduced slope rotatable central composite designs
(SRCCD). Victorbabu and Narasimham (1991) studied in detail the conditions to be
satisfied by a general second order slope rotatable designs (SOSRD) and
constructed SOSRD using balanced incomplete block designs (BIBD). Victorbabu
(2007) suggested a review on SOSRD.
12
Park and Kim (1992) suggested a measure of slope rotatability for second order
response experimental designs. Jang and Park (1993) suggested a measure and a
graphical method for evaluating slope rotatability in response surface designs.
Victorbabu and Surekha (2011) constructed measure of SRCCD. These measures are
useful to enable us to assess the degree of slope rotatability for a given second
order response surface designs.
In this paper, a new method of construction of measure of second order slope
rotatable designs using balanced incomplete block designs is suggested which
enables us to assess the degree of slope-rotatability for a given response surface
design.
Notes:
13
 Poster
Big Basin Ballroom D
Heritability estimates for F2 wheat populations using an additive-additiveenvironment (AAE) genetic model
Mauricio Erazo-Barradas and Jixiang Wu – South Dakota State University
Heritability is referred as the effectiveness of selection for a trait in a
breeding population. It is very important that heritability for traits of
importance is appropriately calculated based on specific genetic models and
data structures. In plant breeding experiments where seed resources and
land availability can be limited for a large number of F1 or F2 hybrids to be
evaluated, a reduced genetic model like an additive-additive-environment
(AAE) model can be used when non-repeated plots for crosses are
employed. Compared to analysis of variance methods, mixed model
approaches have the advantage of estimating variance components and
predicting genetic effects for various types of data structures. For this
research, a spring wheat data set containing bi-parental and multi-parental
F2 populations grown in two environments with no field replications was
analyzed subject to an AAE model. Genetic variance components and
heritability were estimated for five important agronomic traits: heading
date, plant height, grain yield, grain protein, and test weight. Additive and
additive-environment variances were adding up together to calculate
narrow-sense heritability; values of 71.83, 65.28, 58.47, 44.06, and 66.06
were obtained as estimates of narrow-sense heritability for heading date,
plant height, grain yield, grain protein, and test weight respectively. Based
on these results, we can conclude that the expression of additive genetic
effects for these traits could be environment-specific. Additive effects,
equivalent to general combining ability effects, were detected for all the
variables except grain protein, this will allow the breeder to select in early
generations from these wheat crosses when selecting for these traits.
 Poster
Big Basin Ballroom D
Credit access and technical efficiency of small ruminant production among rural
women farmers in Nasarawa State, Nigeria
Asenath K.F.Silong, Christopher Garforth and Dr. Sarah P. Cardey – University of
Reading, UK
Just as in many countries in Sub-Saharan Africa, Agriculture employs nearly 3/4 of
Nigeria’s population, and is the principal source of food and livelihood. Agriculture
therefore becomes a critical component of programs that seek to reduce poverty
and attain food security in Nigeria. Women constitute 49% of the Nigerian
14
population, 72% of this population live in rural areas and constitute the core of the
agricultural sector. As compared to crop production, the participation of rural
women in livestock related activities in Nigeria is much higher and makes significant
contribution to their livelihood and the national economy. However, rural women
farmers in Nigeria are characterized by small holdings and still primitive in their
farming methods. Just like any other agro-business venture, livestock production
uses resource inputs. To overcome the problems of poor performance and declining
productivity; the available resources have to be efficiently utilized. Efficiency
measurement therefore becomes very necessary for monitoring productivity and
benefit economies in ascertaining the extent to which it is possible to increase
productivity using available resources and technologies. Such studies provide a
foundation for the development and adoption of new methods which could push
the frontier of production still higher. The main objective of the study is to
investigate the major factors influencing the efficiency of livestock production
among the female gender in rural areas of Nasarawa State, Nigeria with specific
outlook on the influence of credit and gender power relations on technical
efficiency of the female gender. Data collection has just been concluded. The multistage sampling technique was used to select respondents. Semi structured
interviews and focus group discussions were used as instruments for data
collection. Descriptive Statistics, multiple linear regressions, and the stochastic
frontier production functions will be used for the analysis of quantitative data.
Analysis of qualitative data will be done by the Nvivo.
 Poster
Big Basin Ballroom D
An Improved FWER-Controlling Method in Gene Ontology Graphs
Garrett Saunders, John Stevens and Clay Isom – Utah State University
The Gene Ontology (GO) provides three separate structured vocabularies by which
genes specifying biological functions can be grouped as either a biological process,
cellular component, or molecular function. The structure of each of these
vocabularies is such that genes are mapped to certain (typically many) biological
processes (molecular functions or cellular components) and designated as single GO
terms logically linked as nodes in a directed acyclic graph. While either a gene set
test (like Goeman's Global test) or a p-value combination method (like Fisher's or
Stouffer's method) are used to test each GO term within say all biological processes,
the number of tests performed requires a multiplicity adjustment to keep control
over the family-wise error rate (FWER). Goeman and Mansmann (2008) proposed a
focus level adjustment method which exploits the structure of the GO graph to
adjust for multiplicity while controlling the FWER at a pre-specified level. This work
proposes an improvement to the focus level method which also controls the FWER
at a specified level. The advantage of the new method results in the ability to
consider the GO terms individually and not just within their context of the GO graph
as in the focus level method. Also, the new method is shown to demonstrate
15
increased power over the focus level method to detect significant GO terms. The
resulting method is demonstrated with application to identifying biological process
differences between in vivo and in vitro maturated pig embryos.
 Poster
Big Basin Ballroom D
Soybean Yield Stability Analysis by Conditional AMMI method
Kaushal Raj Chaudhary and Jixiang Wu – South Dakota State University
The grain yield is the complex trait affected by numerous factors such as
environment, genotypes and several other factors. Yield and its yield components
were collected for fifteen soybean cultivars in 2011 and 2012. The multiple
regressions were used to reveal that several yield components significantly
contributed to total variation in yield. In this study, conditional model was used to
detect associations of candidate yield components with yield stability. First, one of
the mixed linear model approaches, Minimum norm quadratic unbiased estimation
(MINQUE) was to estimate conditional variance components and predict
conditional effects. The contribution ratio was also calculated to assess the
contribution of each trait on the targeted trait (yield). Second, we integrated
conditional analysis to AMMI to further reveal association with yield stability for
each cultivar. An R package, GenMod developed by Dr. Jixiang Wu at South Dakota
State University was used to conduct all analyses. Detailed results regarding
soybean yield stability will be reported accordingly.
 Poster
Big Basin Ballroom D
A simulation study of the small sample properties of likelihood based inference
for the beta distribution
Kevin Thompson and Edward Gbur – University of Arkansas
Researchers often collect proportion data that cannot be interpreted as arising
from a set of Bernoulli trials. Analyses based on the beta distribution may be
appropriate for such data. The SAS ® GLIMMIX procedure provides a tool for these
analyses using a likelihood based approach in the larger context of generalized
linear mixed models. Since the t and F-distribution based inference employed in this
approach relies on asymptotic properties, it is important to understand the sample
sizes required to obtain reasonable approximate answers to inference questions. In
addition, the complexity of the likelihood functions can lead to numerical issues
with the optimization algorithms. This simulation study is based on a simple
intercept-only model for known beta distributed responses. Convergence and
estimation issues are investigated over a range of beta distributions and sample
sizes.
16
 Poster
Big Basin Ballroom D
Multivariate Statistical Analysis of Terrestrial Invertebrate Index of Biotic Integrity
Bahman Shafii and William Price – University of Idaho
Norm Merz – Kootenai Tribe of Idaho
The Index of Biotic Integrity (IBI) is designed to measure the changes in ecological
and environmental conditions as affected by human disturbances. In practice, the
IBI is used in various ecological applications to detect divergence in biological
integrity attributable to human actions. Last year during this conference,
methodologies for developing an Avian Index of Biotic Integrity (A-IBI) were
presented and discussed. The objective of this presentation is to demonstrate the
construction and statistical evaluation of a multi-metric terrestrial Invertebrate
Index of Biotic Integrity (I-IBI) using the same multivariate statistical techniques.
Canonical correlation analyses were utilized to select pertinent invertebrate metrics
as impacted by vegetation and hydrology variables. The resulting invertebrate
metrics were then ranked, according to a pre-specified scale of human disturbance,
and the I-IBI scores were subsequently computed. The multivariate model, as well
as the final I-IBI scores were statistically validated using independent temporal data
sets. The techniques are demonstrated using five years of invertebrate survey data
collected on the terrestrial environments within the historic fifty-year floodplain of
the Kootenai River in Northern Idaho.
 Poster
Big Basin Ballroom D
Exploring Distributions of Seasonal Climate Forecasts
Kenneth Wakeland, Mark Kaiser, Chris Zittnan and Chris Anderson – Iowa State
University
The skill of long-range weather forecasts, from one to five months into the future, is
typically assessed using values aggregated over large expanses of space such as a
state or region, and moderate periods of time such as a month or season. But the
process of producing such forecasts produces many individual values in space and
time, values that can be used to define a variety of empirical distributions. It would
be desirable to use the information in these distributions to produce forecasts on a
finer time and space window. Such information would be of great use to both
farmers and livestock producers. We present statistical methods that can be used
to investigate the characteristics of such distributions, ranging from simple
histograms in space and time to complex hierarchical statistical models that
incorporate the effects of individual forecast runs, latitudinal gradients, and
temporal separation between model run and forecast target. We illustrate some of
these methods with 30 years of forecasted daily maximum July temperatures in
Iowa, each year of which has forecast runs started at 4 times in each of 5 days for
each of the months from February through June.
17
 Poster
Big Basin Ballroom D
Association Mapping with Imputed Marker Data
Yi Xu, Yajun Wu, Michael Gonda and Jixiang Wu – South Dakota State University
Association mapping has been widely used to detect desirable genetic markers for
improving traits of interest. Missing marker data can be a challenging issue in
statistics-based association mapping studies, especially for small mapping
populations with a large number of markers being investigated. Many currently
available software packages with variable selection methods are only suitable for
data with no missing points, meaning that individuals with even one missing data
point must be deleted. These deletions can cause significant loss of genetic
information and leads to biased results and conclusions. In this study, we proposed
an imputation method to generate new genotypes to substitute missing genotypes
based on the linkage information available. Once the new data sets are generated,
many computer tools with variable selection methods like forward, stepwise,
random forest, and lasso can be employed for association mapping studies. As a
demonstration, we applied this new approach to a barley marker data set and
detected several putative SNPs associated with heading date.
Notes
18
Download