Statistical displays and methods for analyzing large data sets

advertisement
Statistical displays and methods for analyzing large data sets
Karen Kafadar
Indiana University, Bloomington
The analysis of massive, high-volume data sets stresses usual statistical software systems and
requires new ways of drawing inferences beyond the conventional paradigm (optimal estimation of
parameters from a hypothesized distribution), since the entire data set often cannot be read into the
software system. Internet traffic data and data from high-energy particle physics experiments raise
additional challenges: nearly continuous streams of observations from multiple systems or channels
that interact and exchange information in nondeterministic ways. Internet data in particular invite
cyber attacks, which can spread very rapidly, and which thus require methods that can detect very
rapidly potential departures from “typical” behavior. This talk discusses analyses of Internet traffic
data and data from high-energy physics experiments. Some open issues in analyzing high-volume data
in general are mentioned. (The Internet portion of this talk involved E.J. Wegman, George Mason
University; the physics part involved R.L. Jacobsen, UC-Berkeley.)
Download