Abstract - Mathematical and Statistical Sciences

advertisement
THE 30th ANNUAL ALBERTA STATISTICIANS' MEETING
Saturday, October 25, 2008
University of Alberta
Central Academic Building
Room 265
The 30th Annual Alberta Statisticians’ Meeting is sponsored by the Statistics Centre of the
Department of Mathematical and Statistical Sciences and also by PIMS.
PROGRAM:
12:30 to 13:30: Reception, Registration in CAB 269
13:30 to 14:00: Professor Jingjing Wu, "The application of MHD estimation in a
semiparametric model."
.
14:00 to 14:30: Professor Deniz Sezer, "Quantitative bounds for Markov Chain
convergence: Wasserstein and Total variation distances."
14:30 to 15:00: Professor Gordon Fick, Department of Community Health Sciences,
Faculty of Medicine, "Modifying a Modifier, Confounding a Modifier, Confounding a
Confounder."
15:00 to 16:00: Coffee break in CAB 269
16:00 to 16:30: Dr. W. Su, TBA
16:30 to 17:00: Professor Pengfei Li, “Large Hypothesis test for Normal Mixture Models: the
EM Approach” (Joint work with Professor Jiahua Chen)
17:00 to 17:30: Professor Peter Hooper, “Bayesian inference for belief net responses”.
18:30: Dinner at the home of Professor Doug Wiens (details below).
EVENING: Conference dinner (starting 6:30pm) at Professor Doug
Wiens’ house at 9702 89 AVE NW, EDMONTON, AB
Directions will be provided.
REGISTRATION:
The registration fee is likely to be around $10 per person who is attending the dinner. Those who
are only attending the talks need not pay anything. The registration fee will be waived for graduate
students and upon request for those without research grants.
We can accept registration cash only.
Abstracts for the talks:
1. Professor Jingjing Wu, "The application of MHD estimation in a
semiparametric model."
2. Professor Deniz Sezer, "Quantitative bounds for Markov Chain
convergence: Wasserstein and Total variation distances."
In this talk I will present recent results on Markov Chain convergence
based on joint work with Neal Madras. Let P_n^x, and \pi be
respectively the n-step transition probability kernel and the stationary
distribution of a Markov chain. In many applications it is desirable to
have a quantitative bound for convergence of P_n^x to \pi, i.e. a bound
of the form d(P_n^x,\pi)<g(x,n) where d is a metric on the space of
probability measures and g is a function which can be computed
explicitly. In continuous state spaces one way to obtain a quantitative
bound is formulating the Markov chain as an iterated system of random
maps and applying David Steinsaltz's local contractivity convergence
theorem. If the conditions are satisfied, this theorem yields a
quantitative bound in terms of Wasserstein distance. We first develop a
systematic framework to check for the conditions of Steinsaltz's
theorem, and then show how one can obtain a quantitative bound in terms
of total variation distance from a quantitative bound in terms of
Wasserstein distance.
3. Professor Gordon Fick, Department of Community Health Sciences,
Faculty of Medicine,
"Modifying a Modifier, Confounding a Modifier, Confounding a Confounder."
Objectives:
1) A paradigm in epidemiology will be applied to modeling
2) This modeling application will suggest some interesting interpretations
3) The interpretations will be advanced by appropriate graphs
Abstract:
When an epidemiologist considers the assessment of a disease/exposure
relationship, a key part to this scientific paradigm entails the
assessment of potential modifiers first and then, in the absence of
modification, the assessment of potential confounders. Statistical
modeling can enable this assessment in many settings.
Take a scenario with interest in a disease (D) /exposure (E)
relationship with age (A) and gender (G) as possible confounders /
modifiers. For illustration, lets suppose we have a study that enables
the consideration of the outcome p=Pr(D). The concepts detailed in this
seminar could apply to any outcome: dichotomous, ordinal or interval. In
a sense, all the concepts relate to the 'right-hand' side of the
appropriate regression equation. We can take, for illustration of the
relevant concepts, a study for which the (log) odds of disease is the
primary outcome. Suppose that we have planned a backward elimination
modeling approach but that we wish to incorporate the scientific
paradigm from epidemiology as well.
An investigation using age groupings is often considered by
epidemiologists. This leads to a stratified analysis: the classic tests
for modification via the consideration of possible heterogeneity of the
stratum specific odds ratios and then, if appropriate, the assessment of
possible confounding.
It is suggested in this seminar that the consideration of a model based
approach aught to proceed, in part at least, in the same manner:
consideration of modification and then consideration of confounding. We
will see through some illustrations that there are interesting
combinations of the notions of confounding and modification that can be
considered via the modeling approach.
4. Professor Pengfei Li: Large Hypothesis test for Normal Mixture Models: the EM Approach (Joint
work with Professor Jiahua Chen)
Normal mixture distributions are arguably the most important mixture models, and also the most
challenging technically. The likelihood function of the normal mixture model is unbounded based on
a set of random samples unless an artificial bound is placed on its component variance parameter.
Moreover, the model is not strongly identifiable so it is hard to differentiate between over-dispersion
caused by the presence of a mixture and that caused by a large variance; and it has infinite Fisher
information with respect to mixing proportions. There has been extensive research on finite normal
mixture models, but much of it addresses merely consistency point estimation or useful practical
procedures, and many results require undesirable restrictions on the parameter space.
We show that an EM-test for homogeneity is effective at overcoming many challenges in the context
of finite normal mixtures. We find that the limiting distribution of the EM-test is the chi-square(2)
when variances are unequal and unknown. Simulations show that the limiting distributions
approximate the finite sample distribution satisfactorily. A real example is used to illustrate the
application of the EM-test.
5. Professor Peter Hooper: Quantifying the Uncertainty of a Belief Net Response
A Bayesian belief network models a joint distribution over variables using a directed acyclic graph
to represent variable dependencies and network parameters to represent conditional distributions of
variables given their immediate parents. From a Bayesian perspective, parameters are random
variables with distributions reflecting uncertainty. Belief networks are commonly used to compute
responses to queries Ñ i.e., return a number for P(H=h | E=e). Parameter uncertainty induces
uncertainty about query responses. I will describe theory and methods, both exact and approximate,
for quantifying this type of uncertainty. The discussion will include a new "network doubling"
technique used to obtain a highly accurate approximation of the variance of a query response.
6. Dr. Wanhua Su: Efficient Kernel Methods for Statistical Detection
This research is motivated by a drug discovery problem -- the AIDS anti-virus database from the
National Cancer Institute. The objective of the study is to develop effective statistical methods to
model the relationship between the chemical structure of a compound and its activity against the
HIV-1 virus. As a result, the structure-activity model can be used to predict the activity of new
compounds and thus helps identify those active chemical compounds that can be used as drug
candidates. Since active compounds are generally rare in a compound library, we recognize the drug
discovery problem as an application of the so-called statistical detection problem. In a typical
statistical detection problem, we have data {yi, xi}, where xi is the predictor vector of the ith
observation and yi  (0, 1) is its class label. The objective of a statistical detection problem is to
identify class-1 observations, which are extremely rare. Besides drug discovery problem, other
applications of detection problem include direct marketing and fraud detection.
We have proposed a computationally efficient detection method called LAGO, which stands for
"locally adjusted GO estimator". The original idea was inspired by an ancient game known today as
"GO". The construction of LAGO consists of two steps. In the first step, we estimate the density of
class 1 with an adaptive bandwidth kernel density estimator. The kernel functions are located at and
only at the class-1 observations. The bandwidth of the kernel function centered at a certain class-1
observation is calculated as the average distance between this class-1 observation and its K-nearest
class-0 neighbors. In the second step, we adjust the density estimated in the first step locally
according to the density of class 0. It can be shown that the amount of adjustment in the second step
is approximately inversely proportional to the bandwidth calculated in the first step. Application to
the NCI data demonstrates that LAGO is superior to methods such as K nearest neighbors and
support vector machines.
One drawback of the existing LAGO is that it only provides a point estimate of a test point's
possibility of being class 1, ignoring the uncertainty of the model. In the second part of this thesis,
we present a Bayesian framework for LAGO, referred to BLAGO. This Bayesian approach enables
quantification of uncertainty. Non-informative priors are adopted. The posterior distribution is
calculated over a grid of (K, α) pairs by integrating out β0 and β1 using the Laplace approximation,
where K and α are two parameters to construct the LAGO score. The parameters β0 and β1 are the
coefficients of the logistic transformation that converts the LAGO score to the probability scale.
BLAGO provides proper probabilistic predictions that have support on (0,1) and captures
uncertainty of the predictions as well.
Download