PRINCIPAL RESEARCH PROJECTS SUBMISSIONS Please keep comments as succinct as possible and do not exceed 3 pages in total. This form will be published on the EPSRC’s website as part of the publication of evidence received by the panel. Institution(s): University of Edinburgh Institute/Group Name: Main Contact: Peter Buneman Institute/Group Size (FTE) Academic 5–7 Researchers (FTE) PG Students 10 or (FTE) more Group webpage DCC http://www.dcc.ac.uk/ and UoE Database Group http://www.lfcs.inf.ed.ac.uk/resea rch/database/ School of Informatics, Database Group, LFCS Lecturers 4 (FTE) Strategic Vision Statement of e-Science research (200 words max.): The most exciting interactions between computer sciences and other sciences are those in which there is engagement between the principles of the disciplines. Examples include quantum computing, constraints systems in physics, computational models of biological systems, computational linguistics, new models in economics, etc. Some of this is described in the Microsoft "Towards 2020 Science" report. Unfortunately e-science does not appear to have been interpreted by the UK funding agencies to include these topics. Research Themes (the programme review structure is based on the following themes. Please identify those relevant to your research numbering them in order of priority. Please also give a brief summary of your research focus in each theme and give the key lead contact(s)) (200 words max.): x Data and Information Management x Sharing and Collaboration Distributed Research Infrastructures Research Tools and Techniques Physical Sciences and Engineering Medicinal and Biological Sciences Social Sciences, Arts and Humanities Environmental Sciences Peter Buneman – semistructured data, curated databases, query languages. Wenfei Fan – XML integration, data integration, data transformation, data cleaning, distributed query evaluation, web services. Leonid Libkin – Logic, XML constraints, data exchange. Stratis Viglas – New storage architectures. Distributed query processing. Data representations. Key Research Highlights over last ten years (these might include journal papers, awards, patents, spinout activity, etc. Please be selective and choose a top five similar to RAE). Peter Buneman: FRS, FACM, RS Wolfson Merit Award Wenfei Fan: BCS Needham award Leonid Libkin: Marie Curie Chair in CS Wenfei Fan, Floris Geerts, Xibei Jia and Anastasios Kementsietsidis, Conditional Functional Dependencies for Capturing Data Inconsistencies. ACM Transactions on Database Systems, 33,2 (2008). [This was the topic of his Needham Award talk and initial commercialisation] Wenfei Fan, Chee Yong Chan and M. Garofalakis, Secure XML Querying with Security Views, SIGMOD, 2004 [One of the most highly cited papers on XML security] Marcelo Arenas, Leonid Libkin: A normal form for XML documents. ACM Transactions on Database Systems 29: 195-232 (2004) [An elegant transfer of DB principles to XML] Michael Benedikt, Leonid Libkin: Relational queries over interpreted structures. Journal of the ACM 47(4): 644-680 (2000) [One of Leonid’s deepest papers] Peter Buneman, James Cheney, and Stijn VanSummeren. On the expressiveness of implicit provenance in query and update languages. In Proceedings of the 2007 International Conference on Database Theory. Springer, Jan 2007. [Not hugely cited, but one of my nicest recent ideas] Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. Why and Where: A Characterization of Data Provenance. In Jan Van den Bussche and Victor Vianu, editors, International Conference on Database Theory, pages 316-330. Springer, LNCS 1973, 2001. [Has started a minor industry] Key Research results taken up by industry and/or other users and/or policy makers over the past ten years (please provide information about the technologies and how they were adopted - 200 words max.). Comprehension syntax (Buneman et al.)– adopted by most query languages and for interface languages such as C-omega. Data cleaning techniques (Fan et al.) in early stages of commercialisation. Constraints for semistructured data XML – initiated by Buneman, Fan and Libkin and now wide research area. Data provenance – Initiated by Buneman. Broad impact in computer science. Graduate Student Research Training (Please provide brief details of type of training provided, including current numbers of students as well as trends over the last ten years, country of origin, gender and first destination analysis). Very rough guess: 10-20 PhD students produced by Buneman, Fan and Libkin, mostly in academia. An internationally diverse group. Funding - Describe ongoing support for e-Science enabled research. Please aggregate the data where possible (or please use HESA headings – RC, other public funding etc.). Buneman: EPSRC Platform Grant: Heterogeneous and Permanent Data (2008-12, ~£1.4m); Marie Curie Chair Award (PI, beneficiary Leonid Libkin) (2005-08, ~£400k); EPSRC Visitor, Prof. Marc Scholl (2005, £6k); EPSRC (PI, with others) Digital Curation Centre Research (2004-07, ~£1.1m); EPSRC: Vectorised XML, (PI, with Douglas Armstrong) (200306, £233k); Royal Society Wolfson Merit Award (2002-07, ~£275k); Digital Libraries (NSF, DARPA, NLM) (PI with Davidson et al.) (1999-2002, £500k) Fan: EPSRC Follow on Fund (PI) (2009-10, £102k); EPSRC data cleaning project (PI) (2007-10, £481k); EPSRC XML security project (PI) (2005-2008, £310k); Royal Society of Edinburgh Enterprise Fellowship (PI) (2008-2009, £87k); EPSRC XML publishing (co-PI) (2004-2007, £338k); Distinguished Overseas Young Scholar Award (PI) (2003-2005, RMB 400k); NSF Career Award (PI) (2001-2006, US$300k) Libkin: Databases and interpreted data: constraints, strings, documents (PI), NSERC (200104, CDN$140k); Databases on the web: design, semantics, and query processing (PI), PREA Grant (2002-2006, CDN$150k); IBM Fellowship: Design Principles for XML Data (2004-05, CDN$29k); Designing, querying, and exchanging web data (PI), NSERC (2005-10, CDN$250k); XML Data (PI), EU Marie Curie Chair Programme (2006-09, €480k); Relational and XML Data Exchange: Semantics, Consistency, and Query Answering (PI), EPSRC (2006-09, £458k); XML with Incomplete Information: Representation, Querying, and Applications (PI), EPSRC (2009-13, £577k) Key Collaborators (Academic and non academic, including overseas, please provide details about main groups, countries and sources of funding (200 words max.)) Adriane Chapman, U. Michigan/Mitre Corp Alin Deutsch, UCSD Alon Y. Halevy, Microsoft (Redmond) Anastasios Kementsietsidis, IBM TJ Watson Antonella Poggi, U. Rome "La Sapienza" Arek Kasprzyk, EBI Atsushi Ohori, Tohoku University Benjamin C. Pierce, U. Pennsylvania Carmem S. Hara, U. Federal do Parana (Br) Chee Yong Chan, National U. of Singapore Christoph Koch, Cornell Cristina Sirangelo, ENS, Paris Dan Suciu, University of Washington David J. DeWitt, University of Wisconsin David Maier, Portland State University Fausto Rabitti, CNR, Italy Feng Tian, Chinese Academy of Sciences Frank Neven, Hasselt University, Belgium Gabriel M. Kuper, University of Trento Gao Cong, University of Aarhus Georg Gottlob, TU Wien/Oxford Guozhu Dong, Wright State University Hans-Jörg Schek, ETH Zurich Hongjun Lu, Hong Kong U. of Sci. & Tech. Hongwei Wu, Tsinghua University Jayavel Shanmugasundaram, Cornell Uni. Jeffrey F. Naughton, University of Wisconsin Jeffrey Xu Yu, University of Tsukuba, Japan Jianchang Xiao, Fudan University Jianhua Lu, Tsinghua University Jianzhong Li, Harbin Inst. of Tech., China Juha Nurmonen, Helsinki University Juliana Freire, University of Utah János Demetrovics, Hungarian Ac. of Sci. Jérôme Siméon, IBM TJ Watson Kun Yue, Yunnan University Laks V. S. Lakshmanan, U. British Columbia Laurent Mignet, IBM India Le Gruenwald, University of Oklahoma Limsoon Wong, National U. of Singapore Loreto Bravo, U. de Concepción, Chile Louiqa Raschid, University of Maryland Luc Segoufin, ENS, Paris Marcelo Arenas, Pontificia U. Católica de Chile Martin Grohe, Humboldt-Universität zu Berlin Martín Abadi, University of Santa Cruz Mary F. Fernández, AT&T Labs Michael Benedikt, Oxford University Michael Flaster, Google Ming Xiong, Bell Laborarories Minos N. Garofalakis, Technical U. of Crete Neil Immerman, UMass Amherst Pablo Barceló, University of Chile Per-Åke Larson, Microsoft Redmond Philip Bohannon, Yahoo! Qiong Luo, Hong Kong U. of Sci. & Tech. Rajasekar Krishnamurthy, IBM Almaden Rajeev Alur, University of Pennsylvania Rajeev Rastogi, Yahoo! Bangalore Richard Hull, IBM TJ Watson Ronald Fagin, IBM Almaden Sanjeev Khanna, U. Pennsylvania Scott Weinstein, U. Pennsylvania Serge Abiteboul, University of Paris Orsay Stefano Ceri, University of Milan Stijn Vansummeren, University of Brussels Susan B. Davidson, U. Pennsylvania Thomas Eiter, Technical University of Vienna Thomas Schwentick, University of Passau Timos K. Sellis, National Tech. U. of Athens Timothy G. Griffin, University of Cambridge Tova Milo, Tel Aviv University Vassilis Christophides, FORTH, U. of Crete Victor Vianu, UCSD Wang Chiew Tan, U. of California Santa Cruz Wenguang Chen, Beijing University Wouter Gelade, University of Hasselt Yannis E. Ioannidis, University of Athens Zhaohui Wu, Zhejiang University, Hangzhou Use and access to facilities (please provide details of main facilities used, any issues relating to access and frequency of usage. This can include e-Science infrastructure (i.e. NGS, OMII, DCC, etc.) and others, including campus facilities). Normal university computing facilities. Public engagement activities (please provide details of activities such as public lectures, engaging with the media etc (200 words max.)). Buneman has given several public lectures on curated databases and digital curation at universities, libraries and related institutions. Fan has delivered public lectures in London (Royal Society), Edinburgh and China on data cleaning. Not directly connected with e-Science funding, Buneman has received substantial press coverage over rural internet issues.