Summer.research.positions

advertisement
Possible summer research projects:
1. One proposed project would support the department's efforts with the Carl Wieman
Science Education Initiative. Possible areas of research include (a) evaluation of an online homework system to be trialed in the department over summer 2012, (b)
development of a concept inventory for STAT 241/251 and (c) assessing student
retention from certain key undergraduate courses, such as STAT 241/251, STAT 302
and STAT 305. This project will be partially funded by the Carl Weiman Science
Education Initiative, and partially by the Faculty of Science. Note that INTERNATIONAL
STUDENTS are eligible for this position.
If you are interested in this position, please contact Dr. Bruce Dunham:
b.dunham@stat.ubc.ca
2. The following topics and projects are proposed by Dr. Alexandre Bouchard-Côté. If
you are interested in working with Dr. Bouchard on any of these topics, please contact
him directly: bouchard@stat.ubc.ca
-Topic: Statistics on large scale datasets
Context: The field of statistics is currently in the embarrassing situation of having more
data than it can handle. Most current statistical computing methods do not scale to the
large datasets produced by the web and by large scale scientific projects. In particular,
methods that require estimating a posterior distributions often rely on Markov Chain
Monte Carlo, a method that is notoriously slow for large datasets. Fortunately, many fast
alternatives are emerging.
Projects:
-- Variational inference is one of these alternatives. It works by looking at the
problem of computing the posterior as a constrained optimization problem, and by
relaxing this optimization problem. This project would involve looking at a new way of
relaxing the optimization problem that could potentially yield much more accurate
results.
-- Another project in this topic is to explore which MCMC alternatives can be
effectively computed in parallel. This can have a high impact in large scale Bayesian
analysis scenarios.
-- Beyond posterior computation, many other statistical problems need to be
scaled up. One project would be to apply some recent randomized algorithms to largescale versions of common parameter estimation problems.
-Topic: Phylogenetics
Context: The goal of the field of phylogenetics is to reconstruct evolutionary histories by
studying genetic relatedness among populations. This is a hard and important biological
questions, but most of the current challenges in this field are statistical and
computational.
- One project would be to look at new alternatives to tree models and their
statistical and computational properties. These models bring new capacities such as
gene alignment inference but also new challenges.
- Until recently, phylogenetics was generally done by using a single DNA
sequence to represent each population, but thanks to new genotyping technologies, we
can now look at the frequency of certain mutations across many individuals in each
population. Phylogenetics models that can exploit that type of data are currently
needed.
- I am also working on related projects in population and family genetics. Talk
to me if you want to hear more about these as well!
-Topic: Statistics in linguistics
Context: The collection of all human languages is one of the most exciting existing
dataset, but surprisingly it has not yet been intensely studied by statisticians. As more
than half of the world's languages are currently threatened of extinction, more work is
needed on the task of recording and analyzing this data before it is too late.
Projects:
-- One project would involve developing a machine learning tool to assist field
linguists in their work. This is a more applied project that would be done in collaboration
with linguists.
-- A related project is to extract linguistic data from the web, again by applying
machine learning techniques. For example, can machine learning algorithms be used to
organize semi-structure datasets such as wiktionary or wikipedia?
-- Another project would focus on identifying statistical regularities in
languages. Given a cross-linguistic dataset of language and/or language changes, the
goal would be to perform a model selection study.
3. The following projects are proposed by Dr. Matias Salibián Barrera. If you are
interested in any of these projects, please contact Dr. Salibián directly for more
information: matias@stat.ubc.ca
A. 1 - Robust inference for linear regression models.
This project involves implementing a robust test of hypothesis for linear hypotheses for
the regression coefficients. The tests are based on a high-breakdown and efficient
robust scale estimator, and the corresponding p-values are estimated using our robust
and fast bootstrap method. Parts of this methodology are already implemented in
MATLAB. The summer project consists in translating this code to R, and possibly
creating a corresponding package.
B. - Robust principal components for functional data.
This project deals with a novel principal components analysis method for functional data.
The method is robust to the presence of potential outliers in the data. I am interested in
exploring other properties of this method, e.g. its ability to deal with sparse functional
observations and different types of atypical observations. We will need to run several
numerical experiments comparing this new proposal with other existing ones in the
literature.
C. - Re-sampling methods for Support Vector Machines.
This project concerns studying the properties and performance of a fast bootstrap
method for SVMs. The motivation for this proposal is to be able to build ensembles or to
perform statistical inference (e.g. point-wise confidence bands) in a computationally
efficient and feasible way. The summer project involves performing a thorough literature
review and running numerical experiments to study the properties of this new proposal.
D. - Sparse Kernel K-Means
This project is concerned with the extension of the sparse k-means algorithm, to the
corresponding kernel k-means setup. The summer project involves performing a
thorough literature review, implementing this new proposal in R and running numerical
experiments to study the properties of this new proposal.
Download