Milena Mihail, Associate Professor College of Computing Georgia Institute of Technology mihail@cc.gatech.edu Title: Platforms Supporting the Long Tail of Web Statistics and Intelligent Data Collection In line with recent government and industrial initiatives promoting (a) large scale data collection systems, especially concerning web data, for the purposes of facilitating research in this area, and (b) interdisciplinary research and education across computing and other sciences, in the Microsoft SemGrail Workshop, I wish to raise the following points: (a) In any meaningful data collection effort or study of the web, the human should be considered an integral part of the web infrastructure. What data and by which methods should we start such a data collection efforts? (b) The business value of the web is well accepted and established (for example by businesses like Amazon, eBay and advertising systems like adwords). All these systems use several formal approaches, like recommendation systems, collaborative filtering, online algorithms, auctions and algorithmic game theory, etc. Most of the business value of the web is based on observed skewed or heavy tailed statistics (for example, Amazon does 30% of its business outside the top 100K titles). This represents a dramatic shift of paradigm from principles of economies of scale and mass markets, as have been developed in the last century. We raise the question: Can similar formal approaches increase, in a quantifiable way, the social value of the web? Can we harness the semantics of web communities, and understand, facilitate and promote the culture (broadly defined) of users whose profiles deviate substantially from the average user profile? Short Bio: Milena Mihail received her BA at the National Technical University of Athens in 1984 and her PhD at Harvard University in 1989. She was a member of the technical staff, senior research scientist and manager at Bell Communications Research from 1989 till 1998. She is with the faculty at Georgia Tech since 1999. Within computer science, her primary fields are theory, networking and large scale datasets.