Shaw-Hwa Lo

advertisement
 Data Sciences and Operations
Friday, April 24, 2014 01:30 PM – 02:30 PM
Hoffman Hall 506 Statistics Presents
Shaw-­‐Hwa Lo Professor
Departments of Statistics
Columbia University
Interaction-Based Learning and Predictions in Big Data.
We consider a computer intensive approach (Partition Retention (PR, 09) ), based on an earlier method (Lo and
Zheng
(2002) for detecting which, of many potential explanatory variables, have an influence on a dependent
variable
Y. This approach is suited to detect influential variables in groups, where causal effects depend on the
confluence of values of several variables. It has the advantage of avoiding a difficult direct analysis, involving
possibly thousands of variables, guided by a measure of influence I. We apply PR to more challenging real data
applications, typically involving complex and extremely high dimensional data. The quality of variables selected
is evaluated in two ways: first by classification error rates, then by functional relevance using external biological
knowledge.
We demonstrate that (1) the classification error rates can be significantly reduced by considering
interactions;
(2) incorporating interaction information into data analysis can be very rewarding in generating novel
scientific findings. Heuristic explanations will be provided. If time permits, we tackle and dissect a scientific
puzzle that highly predictive variables do not necessarily appear as highly significant, thus evading the researcher
using significance-based methods. If prediction is the goal, we should lay aside significance as the only selection
standard
Download