WWW’09 Workshop Proposal: Semantic Search Marko Grobelnik, Jožef Stefan Institute, Ljubljana, Slovenia Peter Mika, Yahoo! Research, Barcelona, Spain Thanh Tran Duc, Institute AIFB, University of Karlsruhe (TH), Germany Haofen Wang, Apex Data & Knowledge Management Lab, Shanghai Jiao Tong University, China Executive Summary Semantic technologies, namely expressive ontology and resource description languages, scalable repositories, reasoning engines and information extraction techniques are now in a mature state such that they can be applied to enable a higher level of semantic underpinning in real-world Information Retrieval (IR) systems. This application of semantic technologies to IR tasks is typically referred to as Semantic Search. Challenges on this way include (i) identifying tasks and paradigms for semantic search systems, (ii) devising expressive annotation frameworks as well as scalable algorithms and infrastructures, (iii) investigating innovative query paradigms for semantic search systems, and (iv) applying machine learning and information extraction techniques in the context of semantic search. Topic and Scope In recent years we have witnessed tremendous interest and substantial economic exploitation of search technologies, both at web and enterprise scale. However, the representation of user queries and resource content in existing search appliances is still almost exclusively achieved by simple syntax-based descriptions of the resource content and the information need such as in the predominant keyword-centric paradigm (i.e. keyword queries matched against bag-of-words document representation). While these systems have shown to work well for topical search, i.e. retrieve document based on a topic, they work on the basis of rough approximations and usually fail to address more complex information needs. On the other hand, recent advances in the field of semantic technologies have resulted in tools and standards that allow for the articulation of domain knowledge in a formal manner at a high level of expressivity. At the same time, semantic repositories and reasoning engines have only now advanced to a state where querying and processing of this knowledge can scale to realistic IR scenarios. As such, semantic technologies are now in a state to provide significant contributions to IR problems. More expressive descriptions of resources can be achieved through the conceptual representation of the actual resource content and the collaborative annotation of general resource metadata using standard Semantic Web languages. As a result, there is high potential that complex information needs can be supported by the application of semantic web technologies to IR, where expressive queries can be matched against expressive resource descriptions. In parallel to these developments, in the past years we have also seen the emergence of important results in adapting ideas from IR to the problem of search in RDF/OWL data, folksonomies, microformat collections or semantically tagged natural text. Common to these scenarios is that the search is focused not on a document collection, but on metadata (which may be possibly linked to or embedded in textual information). Search and ranking in metadata stores is another key topic addressed by this workshop. The immediate relevance of the topic for the Semantic / Data Web track as addressed at the WWW’09 conference arises from two aspects. On the one hand, search technology is a dominant technology for direct interaction with end-users, both in web and enterprise settings. However, research efforts in the Semantic Web community in recent years have largely targeted other fields. On the other hand, the success of search engines like GoogleTM, which do not explicitly utilize semantic technologies, challenges the predominant public notion of the Semantic Web as a web that “will yield better search results”. WWW is the best place for this workshop as it covers interdisciplinary topics between Semantic Web and search. Recent trends, such as a significant number of publications at ISWC+ASWC’07 and ESWC’08 that would fit into the workshop scope support the need for a forum that explicitly targets Semantic Search. In particular, our previous workshop (i.e. the first workshop on “semantic search”) was among the biggest ones at ESWC’08 and has attracted the highest number of submissions. Challenges In this context, several challenges arise for Semantic Search systems. These include, among others: How can semantic technologies be exploited to capture the information need of the user? How can the information need of the user be translated to expressive formal queries without enforcing the user to be capable of handling the difficult query syntax? How can expressive resource descriptions be extracted (acquired) from documents (users)? How can expressive resource descriptions be stored and queried efficiently on a large scale? How can vague information needs and incomplete resource descriptions be handled? How can semantic search systems be evaluated and compared with standard IR systems? Topics Main topics of interest for the envisioned workshop contributions include (but are not limited to) the following areas: Tasks and interaction paradigms for semantic search Information retrieval tasks on the semantic web Incentives and interaction paradigms for resource annotation Interaction paradigms for semantic search Collaborative aspects of semantic search (wikis, social networks) Query construction and resource modelling for semantic search Semantic technologies for query interpretation, refinement and routing Natural language interfaces for semantic web repositories Modelling expressive resource descriptions Ontology and metadata Standards for expressive resource descriptions Natural language processing and information extractions for the acquisition of resource descriptions Semantic web mining and semantic network analysis Algorithms and infrastructures for semantic search Scalable reasoners, repositories and infrastructures for semantic search Crawling, storing and indexing of expressive resource descriptions Fusion of semantic search results on the semantic web Algorithms for matching expressive queries and resource descriptions Algorithms and reasoning procedure to deal with vagueness, incompleteness and inconsistencies in semantic search Evaluation of semantic search Evaluation methodologies for semantic search Standard datasets and benchmarks for semantic search Community and related activities Intended Audience The workshop is of interest for researchers in Semantic Web, Information Retrieval, Information Extraction and User interaction with research interests at the intersection of these fields. The following research projects address or partially address these fields and respective project coordinators have indicated they wish to act as sponsors, to advertise the workshop amongst their members and promote attendance. X-Media - Knowledge Sharing and Reuse across Media (EU IST IP) NEON - Lifecycle Support for Networked Ontologies (EU IST IP) PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning (EU IST NOE) ACTIVE - Enabling the Knowledge Powered Enterprise (EU ICT IP) THESEUS (German Federal Ministry of Economy and Technology Research Program) The EU IP projects LarKC, OKKAM, and WeKnowIt as well as the EU STREP projects SMARTMUSEUM, KIWI, and JUMAS will provide additional support in dissemination. Recent related Events The following recent events have addressed issues related to the topic of Semantic Search: Workshop on Semantic Search, ESWC 2008, Tenerife, Spain Workshop on Web Search Technology, ASWC 2006, Beijing, China Workshop on Learning in Web Search, ICML 2005, Bonn, Germany Workshop on Learning and Extending Lexical Ontologies by using Machine Learning Methods, ICML 2005, Bonn, Germany Workshop on Knowledge Discovery and Ontologies at ECML 2005, Porto, Portugal 2nd European Web Mining Forum at ECML/PKDD 2005, Porto, Portugal Workshop on Mining for and from the Semantic Web at KDD 2004, Seattle Workshop on Semantic Network Analysis at ISWC 2005, Galway Organization The workshop will preferably be held as a full day workshop and will feature two invited talks. These will be one hour each, for the morning and afternoon. As a networking opportunity, the workshop will also devote one hour to presentations of related research projects, their view on Semantic Search and possible synergies. Submissions will be thoroughly reviewed by at least three reviewers, two of which should represent the two main perspectives i.e. Semantic Technologies and Information Retrieval. We will publicize the workshop via mailing lists (the respective W3C, Semantic Web, Information Retrieval lists and available related project lists) and to addresses of participants of previously held related workshops. Additionally we will provide links from the homepages of our institutes and of related projects. Important dates will be aligned with the overall WWW organization. Program Committee We target a balanced program committee that includes experienced researchers with a strong research record in the relevant research areas of Semantic Technologies and Information Retrieval (including Information Extraction, Text Mining and Multimedia Retrieval). Specifically, the following people have indicated their willingness to review submissions and to disseminate the workshop: Bettina Berendt, Univerity Leuven, Belgium Paul Buitelaar, DFKI Saarbrücken, Germany Wray Buntine, NICTA Camberra, Australia Pablo Castells, Universidad Autónonoma de Madrid, Spain (to be confirmed) Philipp Cimiano, Institute AIFB, University of Karlsruhe, Germany Fabio Ciravegna, University of Sheffield, UK Blaz Fortuna, Jozef Stefan Institute, Slovenia Lise Getoor, University Maryland, USA Rayid Ghani, Accenture Labs, USA Peter Haase, Institute AIFB, University of Karlsruhe, Germany Andreas Hotho, University of Kassel, Germany Esther Kaufmann, University of Zurich, Switzerland Yiannis Kompatsiaris, Informatics and Telematics Institute, Greece Eduarda Mendes Rodrigues, Microsoft Research, Cambridge, UK Steffen Staab, University of Koblenz-Landau, Germany Nenad Stojanovic, FZI Karlsruhe, Germany Rudi Studer, Institute AIFB, University of Karlsruhe, Germany Raphael Volz, FZI Karlsruhe, Germany, Michael Witbrock, Cycorp, USA Ilya Zaihrayeu, University of Trento, Italy Hugo Zaragoza, Yahoo! Research Barcelona, Spain Yong Yu, Shanghai Jiao Tong University, China Background on workshop organizers In the following, we provide background information about the workshop organizers. Marko Grobelnik Jožef Stefan Institute, Department for Intelligent Systems Jamova 39, SLO-1000 Ljubljana, Slovenia Office Phone: +386.61.1773-778; eMail: Marko.Grobelnik@ijs.si http://www-ai.ijs.si/MarkoGrobelnik/ Marko is researcher and manager of research group of 15 people at the department of Knowledge Department working primarily in the areas of text-mining and social network analysis. He is coauthor of several books and numerous scientific papers. Marko is a technical director of FP6 IST World project on analysis of European research, a member of management board of several FP6 projects and participates in W3C standardizing committees. He co-organized over 10 international workshops and tutorials on text mining and link analysis at prominent conferences like IJCAI, ACMKDD, IEEE-ICDM. Marko also closely collaborates on research projects with Microsoft Research, Cycorp Europe, Carnegie Mellon University and Cornell University. Peter Mika Yahoo! Research, Barcelona Lab Ocata 1, 1st floor, E-08003, Barcelona, Spain Office Phone: +34.935.421-165; eMail: pmika@yahoo-inc.com http://research.yahoo.com/Peter_Mika Peter is researcher at Yahoo! Research, Barcelona. He obtained his PhD in 2007 from the Business Informatics group of the Faculty of Sciences (FEW) at the Vrije Universiteit, Amsterdam. His research focus is on Search Technologies, Semantic Technologies and Social Networks. His interdisciplinary work in the field of Social Networks and the Semantic Web earned a Best Paper Award at the International Semantic Web Conference in Galway, 2006 and a First Prize at the Semantic Web Challenge of 2005. He is also author of the book “Social Networks and the Semantic Web”, published in 2007 by Springer Verlag. He has been involved in several large European Semantic Web projects such as On-To-Knowledge, SWAP (Semantic Web and Peer-to-Peer) and WonderWeb. Thanh Tran Duc University of Karlsruhe, Institute AIFB, Knowledge Management Research Group D-76128 Karlsruhe, Germany Office Phone: +49.721.608-7363; eMail: dtr@aifb.uni-karlsruhe.de Thanh is research associate and PhD student at the Institute AIFB, University of Karlsruhe (TH). He has received two awarded degrees, a Master of Commerce at the Macquarie University, Australia and a Master of Business Information Systems at the Otto von Guericke University. He has worked as project associate and software engineer for IBM and Capgemini. His interdisciplinary work in the field of Knowledge Representation, Database and Information Retrieval is published in numerous proceedings and journals (ICDE, WWW; ISWC). He is currently involved in a large European Semantic Web called X-Media. Haofen Wang Shanghai Jiao Tong University, Apex Data & Knowledge Management Lab 800, Dongchuan Road Shanghai, China Office Phone: +86.21.5474-5879; eMail: whfcarter@apex.sjtu.edu.cn http://apex.sjtu.edu.cn/apex_wiki/whfcarter Haofen is research associate and PhD student at the Apex Data & Knowledge Management Lab, Shanghai Jiao Tong University. He received his master in Computer Science and Engineering from Shanghai Jiao Tong University. His research interests include semantic data creation & integration, Semantic Web Data Indexing & Search and Query Interface & User Interaction for the Semantic Web. He has published several high-quality papers and has served as program committee member and reviewer for various conferences and journals on these topics. Haofen also successfully took charge of several joint research projects with IBM China Research laboratory and Intel Research China.