VLDB2013 39th International Conference on Very Large Data Bases, Riva del Garda, Trento, Italy Proceedings of the VLDB Endowment Volume 6, No. 8 – June 2013 Proceedings of the 39th International Conference on Very Large Data Bases, Riva del Garda, Trento, Italy Editors‐in‐Chief: Michael Böhlen, Christoph Koch Associate Editors – Research Track: Ashraf Aboulnaga, Sihem Amer‐Yahia, Chee Yong Chan, Yanlei Diao, Ada Waichee Fu, Johannes Gehrke, Alon Halevy, Jayant Haritsa, Nikos Mamoulis, Thomas Neumann, Dan Olteanu, Divesh Srivastava, Jens Teubner Associate Editor – Experiments and Analysis Track: Stefan Manegold Guest Editors: Sihem Amer‐Yahia, Stefan Manegold Proceedings Editors: Peer Kröger, Stratis D. Viglas PVLDB – Proceedings of the VLDB Endowment Volume 6, No. 8, June 2013. The 39th International Conference on Very Large Data Bases, Riva del Garda, Trento, Italy. Copyright 2013 VLDB Endowment Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyright for components of this work owned by others than VLDB Endowment must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists requires prior specific permission and/or a fee. Request permission to republish from PVLDB under email: info@vldb.org. Volume 6, Number 8, June 2013: VLDB 2013 Pages ii - x and 541 - 600 ISSN 2150-8097 Additional copies only online at: portal.acm.org, arxiv.org/corr, and www.vldb.org PVLDB Vol. 6 No. 8 ii VLDB2013 – Riva del Garda, Trento, Italy TABLE OF CONTENTS Front Matter Copyright Notice .................................................................................................................. Table of Contents ................................................................................................................ VLDB 2013 Organization and Review Board ........................................................................... ii iii iv Letters Letter from the Guest Editors ........................................... Sihem Amer-Yahia, Stefan Manegold x Research Papers Hybrid Storage Management for Database Systems................................................................. .......................................................................................................... Xin Liu, Kenneth Salem 541 Scorpion: Explaining Away Outliers in Aggregate Queries......................................................... ..................................................................................................Eugene Wu, Samuel Madden 553 Ratio Threshold Queries over Distributed Data Sources ........................................................... ............................................................... Rajeev Gupta, Krithi Ramamritham, Mukesh Mohania 565 On the Complexity of Query Result Diversification ................................................................... .......................................................................................................... Ting Deng, Wenfei Fan 577 Streaming Quotient Filter: A Near Optimal Approximate Duplicate Detection Approach for Data Streams ...................................................................... ........................................................................... Sourav Dutta, Ankur Narang, Suman K. Bera PVLDB Vol. 6 No. 8 iii 589 VLDB2013 – Riva del Garda, Trento, Italy VLDB 2013 ORGANIZATION AND REVIEW BOARD General Chairs Themis Palpanas, University of Trento Yannis Velegrakis, University of Trento Program Chairs Michael Böhlen, University of Zurich Christoph Koch, EPFL Advisory Board Paolo Atzeni, Universita Roma Tre Stefano Ceri, Politecnico di Milano John Mylopoulos, University of Trento Award Committee Surajit Chaudhuri, Microsoft (Chair) Mike Carey, University of California, Irvine Susan Davidson, University of Pennsylvania Alon Halevy, Google Sunita Sarawagi, IIT Bombay Associate Editors Ada Wai-Chee Fu, Chinese University of Hong Kong Alon Halevy, Google Ashraf Aboulnaga, University of Waterloo Chee-Yong Chan, National University of Singapore Dan Olteanu, Oxford University Divesh Srivastava, AT&T Labs Jayant Haritsa, Indian Institute of Science Bangalore Jens Teubner, ETH Zurich Johannes Gehrke, Cornell University Nikos Mamoulis, University of Hong Kong Sihem Amer-Yahia, Qatar Computing Research Institute Stefan Manegold, CWI Thomas Neumann, Technische Universität München Yanlei Diao, University of Massachusetts Amherst PVLDB Vol. 6 No. 8 iv VLDB2013 – Riva del Garda, Trento, Italy Experiments and Analysis Track Associate Editor Stefan Manegold, CWI Industrial and Applications Track Associate Editors Min Wang, HP Labs China Cong Yu, Google Research Demonstration Chairs Jun Yang, Duke University Dimitrios Gunopulos, University of Athens Letizia Tanca, Politecnico di Milano Reproducibility Chairs Philippe Bonnet, IT University of Copenhagen Juliana Freire, New York University Dennis Shasha, New York University Research Track Review Board Karl Aberer, EPFL, Switzerland Brian Cooper, Google Foto Afrati, NTU Athens Bin Cui, Peking University Charu Aggarwal, IBM T. J. Watson Research Center Carlo Curino, MIT Yanif Ahmad, JHU Sudipto Das, Microsoft Research Jose-Luis Ambite, University of Southern California Anish Das Sarma, Google Research Walid Aref, Purdue University Atish Das Sarma, eBay Research Labs Magdalena Balazinska, University of Washington Antonios Deligiannakis, Technical University of Crete Srikanta Bedathur, IIIT Delhi Amol Deshpande, University of Maryland Peter Boncz, CWI Xin Luna Dong, AT&T Labs-Research Nico Bruno, Microsoft Sameh Elnikety, Microsoft Research Randal Burns, JHU Mohamed Eltabakh, Worcester Polytechnic Institute Andrea Cali, University of London, Birkbeck College Alan Fekete, University of Sydney Carlos Castillo, Yahoo! Hakan Ferhatosmanoglu, Bilkent University Gang Chen, Zhejiang University Alvaro Fernandes, U. of Manchester Lei Chen, Hong Kong University of Science and Technology Juliana Freire, New York University Benjamin C. M. Fung, Concordia University Shimin Chen, HP Labs China Fabien Gandon, INRIA James Cheng, CUHK Reynold Cheng, University of Hong Kong Minos Garofalakis, Technical University of Crete, Greece Gao Cong, Nayang Technological University Buğra Gedik, Bilkent University PVLDB Vol. 6 No. 8 v VLDB2013 – Riva del Garda, Trento, Italy Rainer Gemulla, Max-Plack-Institut Saarbrücken Paul Larson, Microsoft Gabriel Ghinita, University of Massachusetts Boston Mong-Li Lee, National University of Singapore Parke Godfrey, York University Wang-Chien Lee, Penn State University Michaela Goetz, Cornell University Wolfgang Lehner, Technische Universität Dresden Lukasz Golab, University of Waterloo Chengkai Li, The University of Texas at Arlington Sergio Greco, University of Calabria Cuiping Li, Renmin University of China Le Gruenwald, University of Oklahoma Feifei Li, University of Utah Krishna Gummadi, MPI Guoliang Li, Tsinghua University Haryadi Gunawi, University of California, Berkeley Lipyeow Lim, University of Hawaii at Manoa Rahul Gupta, IIT Bombay Xuemin Lin, University of New South Wales Marios Hadjielefhteriou, AT&T labs Eric Lo, The Hong Kong Polytechnic University Kuno Harumi, HP Labs Boon Thau Loo, University of Pennsylvania Michael Hay, Cornell Qiong Luo, Hong Kong University of Science and Technology Bingsheng He, NTU Singapore Ashwin Machanavajjhala, Duke University Sven Helmer, Free University of Bozen-Bolzano Sanjay Madria, University of Missouri-Rolla Howard Ho, IBM Almaden Research Amélie Marian, Rutgers University Katja Hose, Aalborg University Frank McSherry, Microsoft Bill Howe, University of Washington Sharad Mehrotra, University of California, Irvine Jeong-Hyon Hwang, State University of New York, Albany Poess Meikel, Oracle Stratos Idreos, CWI Mohamed Mokbel, University of Minnesota Hans-Arno Jacobsen, University of Toronto Bongki Moon, University of Arizona Ricardo Jimenez-Peris, Technical University of Madrid Kyriakos Mouratidis, Singapore Management University Ruoming Jin, Kent State University Gero Muhl, University of Rostock Ryan Johnson, University of Toronto Karin Murthy, IBM Research Vanja Josifovski, Yahoo Inc. Suman Nath, MSR Panos Kalnis, King Abdullah University of Science and Technology Wolfgang Nejdl, University of Hannover Vana Kalogeraki, Athens Univ. of Econ. and Business Sylvia Nittel, University of Maine Carl-Christian Kanne, University of Mannheim Beng Chin Ooi, National University of Singapore Hillol Kargupta, University of Maryland Baltimore County Tamer Ozsu, University of Waterloo Esther Pacitti, University of Montpellier Yiping Ke, Institute of High Performance Computing Ippokratis Pandis, IBM Almaden Anne-Marie Kermarrec, INRIA Olga Papaemmanouil, Brandeis University Daniel Kifer, PSU Srinivasan Parthasarathy, The Ohio State University Changkyu Kim, Intel Jignesh Patel, University of Wisconsin George Kollios, Boston University Peter Pietzuc, Imperial College London Christian König, Microsoft Research Neoklis Polyzotis, University of California, Santa Cruz Laks V. S. Lakshmanan, University of British Columbia PVLDB Vol. 6 No. 8 vi Lucian Popa, IBM Research VLDB2013 – Riva del Garda, Trento, Italy Bordawekar Rajesh, IBM T.J. Watson Evimaria Terzi, University of Boston Vibhor Rastogi, Yahoo Martin Theobald, Max Planck Institute, Germany Christopher Re, University of Wisconsin, Madison Anthony Tung, National University of Singapore Matthias Renz, Ludwig-Maximilians University Munich, Germany Kostas Tzoumas, Technical University of Berlin Marie-Christine Rousset, IMAG Sergei Vassilvitskii, Google Stratis D. Viglas, University of Edinburgh Sourav S. Bhowmick, Nayang Technological University Dimitris Sacharidis, IMIS Athena, Greece Ke Wang, Simon Fraser University Ingmar Weber, Yahoo! Kenneth Salem, Univesity of Waterloo Maria Sapino, University of Torino Raymond Chi-Wing Wong, Hong Kong University of Science and Technology Monica Scannapieco, Istat Xiaokui Xiao, NTU Bernhard Seeger, Philipps-Universität Marburg Dong Xin, Google Pierre Senellart, Télécom ParisTech Xifeng Yan, University of Santa Barbara Cyrus Shahabi, USC Jiong Yang, Case Western Reserve University Lidan Shou, Zhejiang University Ke Yi, Hong Kong University of Science and Technology Adam Silberstein, Trifacta Man Lung Yiu, Hong Kong Polytechnic University Radu Sion, Stony Brook University Cong Yu, Google Research Yannis Sismanis, IBM, USA Ge Yu, Northeastern University, China Mohamed Soliman, University of Waterloo Jeffrey Yu, Chinese University of Hong Kong Julia Stoyanovich, Drexel University and Skoltech Wenjie Zhang, UNSW Australia Yufei Tao, Chinese University of Hong Kong Baihua Zheng, Singapore Management University Sandeep Tata, IBM Research Aoying Zhou, East China Normal University Nesime Tatbul, ETH Zurich Xiaofang Zhou, University of Queensland Demonstration Program Committee Anastasia Ailamaki, EPFL Nick Koudas, University of Toronto Sihem Amer-Yahia, Qatar Computing Research Institute Nikos Mamoulis, University of Hong Kong Giansalvatore Mecca, Università della Basilicata Leopoldo Bertossi, University of Carleton Alexandra Meliou, University of Washington Francois Bry, University of Munich Rachel Pottinger, University of British Columbia Chee-Yong Chan, National University of Singapore Rajeev Rastogi, Yahoo! India Kevin Chang, UIUC Bernhard Seeger, University of Marburg Chin-Wan Chung, Korea Advanced Institute of SaT Ambuj Singh, University of California, Santa Barbara Gautam Das, University of Texas, Arlington Jens Teubner, ETH Zurich Aris Gkoulalas-Divanis, IBM Research Ireland Wei Wang, University of New South Wales Torsten Grust, Universität Tübingen Li Xiong, Emory University Herodotos Herodotou, Microsoft Research Jia Yuan Yu, IBM Research Yoshiharu Ishikawa, Nagoya University Demetris Zeinalipour, University of Cyprus Flip Korn, AT&T Labs PVLDB Vol. 6 No. 8 Shuigeng Zhou, Fudan University vii VLDB2013 – Riva del Garda, Trento, Italy Industrial Track Committee Michael Brodie, Verizon Felix Naumann, University of Potsdam Alejandro Buchmann, Technische Universität Darmstadt Fatma Ozcan, IBM Research Shimin Chen, HP Labs China Radu Popescu-Zeletin, Fraunhofer-Institut für Offene Kommunikationssysteme Umeshwar Dayal, HP Labs Raghu Ramakrishnan, Microsoft Shel Finkelstein, SAP Jun Rao, LinkedIn Dieter Gawlick, Oracle Len Seligman, MITRE Tasos Kementsietsidis, T.J. Watson Research Center Eric Simon, SAP Tim Kraska, Brown University Haixun Wang, Microsoft Research Yue Lu, twitter Fei Wu, Google Research Arnab Nandi, The Ohio State University Jackie Xiang, Foursquare Reproducibility Committee Matias Bjørling, IT University of Copenhagen Mian Lu, Hong Kong University of Science and Technology Wei Cao, Remnin University Dan Olteanu, University of Oxford Stratos Idreos, Centrum Wiskunde & Informatica Paolo Papotti, Qatar Computing Research Institute (QCRI) Ryan Johnson, University of Toronto Martin Kaufmann, ETH Zurich Ben Sowell, Cornell University David Koop, University of Utah Lucja Kot, Cornell University Radu Stoica, EPFL - Ecole Polytechnique Federale de Lausanne Willis Lang, University of Wisconsin Dimitris Tsirogiannis, Microsoft Jim Gray Systems Lab PhD Workshop Chairs Tutorial Chairs Angela Bonifati, Icar-CNR Serge Abiteboul, INRIA Sanjay Chawla, University of Sydney Gianni Mecca, Universita della Basilicata Chris Jermaine, Rice University Haixun Wang, Microsoft Research Asia Panel Chairs Sponsorship Chairs Shivnath Babu, Duke University Sam Madden, Massachusetts Institute of Technology Stavros Harizopoulos, Nou Data Vassilis Vassalos, Athens Univ. of Econ. and Business Ihab Ilyas, Qatar Computing Research Institute Paolo Merialdo, Universita Roma Tre Publicity Chair Proceedings Chairs Tasos Kementsietsidis, IBM T.J. Watson Research Center Peer Kröger, Ludwig-Maximilians University, Munich Stratis D. Viglas, University of Edinburgh Web Management Chair Treasury Chair Francesco Guerra, University of Modena and Reggio Emilia Marios Hadjieleftheriou, AT&T Labs Research PVLDB Vol. 6 No. 8 viii VLDB2013 – Riva del Garda, Trento, Italy Local Administration Logo Design Ufficio Convegni and dbTrento Group, University of Trento Sakis Palpanas PVLDB Information Director Gerald Weber, University of Auckland PVLDB Advisory Committee Philip Bernstein, Michael Böhlen, Peter Buneman, Susan Davidson, Z. Meral Ozsoyoglu, S. Sudarshan, Gerhard Weikum PVLDB Vol. 6 No. 8 ix VLDB2013 – Riva del Garda, Trento, Italy LETTER FROM THE GUEST EDITORS It is our pleasure to present the eighth issue of Proceedings of the VLDB Endowment (PVLDB) Volume 6. This issue contains five papers, all from the Research Track. Topics include data analytics, novel storage, distributed query optimization, diversity complexity and data streams. These and other papers will be presented at the VLDB 2013 Conference, to be held in Riva del Garda, Trento, Italy, August 26-30, 2013. All of these papers were reviewed in a journal-style review and revision process with monthly year-round submission cycles. As Associate Editors, we have enjoyed overlooking the review process of the excellent papers accepted to VLDB 2013. We thank the authors for their submissions and for their willingness to incorporate obtained feedback in time. We also thank the Review Board for ensuring scientific and technical quality of the papers with thorough reviews and on-time discussions. It has been a pleasure to work with an outstanding set of editorial colleagues. We hope that this new process will continue in the next years. These proceedings would not have been possible without the continuous involvement of our colleagues in the research community. We are proud to be part of such a team and we look forward to seeing you in Riva del Garda. Sihem Amer-Yahia, CNRS Stefan Manegold, CWI Associate Editors, PVLDB 2013 PVLDB Vol. 6 No. 8 x VLDB2013 – Riva del Garda, Trento, Italy