Statistics in Biosciences: Statistical Methods for Big Data from Health

advertisement
Statistics in Biosciences: Statistical Methods for Big Data from Health
Science
Organizer and Chair: Grace Yi (University of Waterloo)
XIHONG LIN, Harvard University
The Generalized Higher Criticism for Testing SNP-set Eects in Genetic Association Studies
We propose the Generalized Higher Criticism (GHC) to test for the association between a SNP set, e.g., a gene or a network,
and a disease outcome in the presence of sparse alternative. The proposed GHC overcomes the limitations of the HC by allowing
for arbitrary correlation structures among the SNPs in a SNP-set, while performing accurate analytic p-value calculations for
any nite number of SNPs in the SNP-set. We obtain the detection boundary of the GHC test. We compared empirically using
simulations the power of the GHC method with existing SNP-set tests over a range of genetic regions with varied correlation
structures and signal sparsity. We apply the proposed methods to analyze the CGEM breast cancer genome-wide association
study.
HONGZHE LI, University of Pennsylvania
Sparse Simultaneous Signal Detection and Its Applications in Genomics
The increasing availability of large-scale genomic data has made possible an integrative approach to studying disease. Such
research seeks to uncover disease mechanisms by combining multiple types of genomic information, which may be collected
on multiple sets of patients. I focus on a study that integrates GWAS and eQTL data collected from two dierent sets of
subjects to nd transcripts potentially functionally relevant to human heart failure. I formalize a model that denes important
transcripts as those whose expression levels are associated with SNPs that are simultaneously associated with disease and
propose a new procedure to test for detecting simultaneous signals. I show that the test statistic is asymptotically optimal
under certain conditions. I present several applications and extensions.
CHARMAINE DEAN & MARK WOLTERS, Western University & Fudan University
Parameter Estimation in Autologistic Regression Models for Detection of Smoke in Satellite Images
Smoke from forest res is a health hazard that is dicult to study through direct measurement. Images from earth-orbiting
satellites provide a potentially valuable data source to catalogue smoke events over space and time. We are developing
classiers to segment satellite images into smoke and nonsmoke regions using the autologistic regression model, a Markov
random eld model with logistic regression as a special case. The large size of the images (both in terms of pixel count
and number of image planes) introduces a variety of computational challenges when using this model. The talk will focus
on parameter estimation, comparing alternative approaches and discussing how the goal of the studypredictive accuracy or
parameter interpretationmight inuence the choice of estimation method.
1
Download