Research Output Repository Platform (code name “Famulus”) Alex Wade – Research Program Manager Savas Parastatidis – Software Philosopher eScience Workshop 2008 Tutorial Objectives And Takeaways • Overview of Semantic Computing concepts • Context of Research Repositories • How to leverage our technologies to build a solution for the digital repositories community • Takeaway: How to use MS technologies in order to build an extensible semantic computing platform • Takeaway: The role of digital repositories in the future academic and research environments and what MS can offer Outline • Background – Semantic Computing – Research-output Repositories • MSR’s Repository Platform – Features – Architecture • Demos • Wrap-up But we love to improvise! So let’s make the session interactive Semantics • Term used to refer to the concept of “meaning” – The linguistics, AI, Natural Language Processing, etc. communities have been working on “meaning” and ”knowledge” related technologies for decades • Semantic Computing – Emergence of a new breed of technologies to capture meaning (RDF, OWL, etc.) – Combine with the pervasiveness of the Web community technologies such as folksonomies … What is Semantic Computing? • Set of concepts and technologies – – – – – – Data modeling Relationships Ontologies Machine learning (entity extraction) Inference, reasoning Data, information, knowledge… Data Information Knowledge Current technologies Possibilities for innovation Intelligence Wisdom Semantic Computing Set of technologies to... • Model data and their connections • • • • – e.g. RDF, Topic Maps, Unified Content Descriptors Capture concepts and their relationships – e.g. OWL Query data and produce information – e.g. SPARQL Reason about data, concepts, information – e.g. Pellet Extract structured information (machine learning) – e.g. Live Labs entity extraction (http://labs.live.com/Entity+Extraction.aspx) Today… Computers are great tools in storing computing managing indexing huge amounts of data For example, Google and Microsoft both have copies of the Web for indexing purposes Tomorrow… Computers are great tools in We would like computers to also help with the automatic storing computing managing indexing acquisition discovery aggregation organization correlation analysis interpretation inference huge amounts of data of the world’s information RESEARCH-OUTPUT REPOSITORIES Background • Traditional research output = Journal articles – Pros: peer-review, indexed, archived – Cons: timeliness, cost, access, format limits • Response = Digital Repositories – Subject Repositories • arXiv.org (Physics, Math, CS) • PubMed Central (Biomedical) – Institutional Repositories – Data Type repositories • Data sets, presentations, workflows, etc. The Information Lifecycle Research & Analysis • Research Information Center • Researcher Desktop Storage & Archiving Authoring • Research Output Repository • • • • Article Authoring Add-in for Word Creative Commons Office Add-in Semantic Annotations in Word Reproducible Research in Word Publication & Dissemination • eJournal Service • Conference Management Tool Famulus A platform for building services and tools for research output repositories • Papers, Videos, Presentations, Lectures, References, Data, Code, etc. • Relationships between stored entities UIs Desktop Tools Search Famulus platform Goals • Enable a tools and services ecosystem for “research output” repositories on MS technologies Interop Syndication Services – Interoperability as one of the primary goals • Modeling – RDFs – RDF Schema • Syndication and Re-Use – RSS/Atom – OAI-PMH – Protocol for Metadata Harvesting – OAI-ORE – Object Re-Use and Exchange • Ingest & publishing protocols – SWORD – Simple Web-service Offering Repository Deposit – AtomPub – BibTex Famulus architecture goals • • • • • Goals Create a platform for building “research output” repositories Engage with the digital library and scholarly communications community Support an ecosystem of services and tools Available to the community for free (we are still considering the open source route) Build an easy-to-install collection of basic services and tools • • • Non-goals A generic platform for asset management Support the lifecycle of publications Compete with existing repository solutions 3rd-party services, tools, applications Famulus services, Web, interoperability Famulus Platform (Based on the Entity Framework + Data model) SQL Server 2008, MS data storage technologies, Entity Framework runtime, .NET 3.5, LINQ Research Output Repository Platform • A Semantic Computing platform • A hybrid between a relational database and a triple store Triple stores -Evolution friendly -Poor performance -No need to model everything in advance -Semantic interpretation at the application level Relational schema -Evolution not so easy -Great opportunities for optimization -Model everything in advance Famulus Platform -Maintain a balance -Try to model the frequently used entities in our app domain -Try to capture the frequently used relationships -Allow for extensibility (Relationships, Properties) An intuitive programming experience Person tony = new Person(); Publication pub1 = new Publication(); pub1.Title = "Title1"; Publication pub2 = new Publication(); pub2.Title = "Title2"; pub1.Cites.Add(pub2); pub1.Authors.Add(tony); Tag tag = new Tag(); tag.Name = "keyword"; pub1.Tags.Add(tag); Famulus Platfomr PDF file contains is representation of Lecture on 2/19/2008 PowerPoint presentation authored by organized by presented by tony Elizabeth, Sebastien, Matthew, Norman, Brian, Sarah, George, Roy Data Mesh Release Roadmap • Customer Technology Preview has been released – Requires SQL Sever 2008 (Express) • Public beta January-February 09 timeframe • RTM ??? DEMO © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.