Research Output Repository Platform Alex Wade – Research Program Manager

advertisement
Research Output Repository Platform
(code name “Famulus”)
Alex Wade – Research Program Manager
Savas Parastatidis – Software Philosopher
eScience Workshop 2008
Tutorial Objectives And Takeaways
• Overview of Semantic Computing concepts
• Context of Research Repositories
• How to leverage our technologies to build a
solution for the digital repositories community
• Takeaway: How to use MS technologies in order
to build an extensible semantic computing
platform
• Takeaway: The role of digital repositories in the
future academic and research environments and
what MS can offer
Outline
• Background
– Semantic Computing
– Research-output Repositories
• MSR’s Repository Platform
– Features
– Architecture
• Demos
• Wrap-up
But we love to improvise!
So let’s make the session interactive
Semantics
• Term used to refer to the concept of “meaning”
– The linguistics, AI, Natural Language Processing, etc.
communities have been working on “meaning” and
”knowledge” related technologies for decades
• Semantic Computing
– Emergence of a new breed of technologies to capture
meaning (RDF, OWL, etc.)
– Combine with the pervasiveness of the Web
community technologies such as folksonomies …
What is Semantic Computing?
• Set of concepts and technologies
–
–
–
–
–
–
Data modeling
Relationships
Ontologies
Machine learning (entity extraction)
Inference, reasoning
Data, information, knowledge…
Data
Information
Knowledge
Current technologies
Possibilities for innovation
Intelligence
Wisdom
Semantic Computing
Set of technologies to...
• Model data and their connections
•
•
•
•
– e.g. RDF, Topic Maps, Unified Content Descriptors
Capture concepts and their relationships
– e.g. OWL
Query data and produce information
– e.g. SPARQL
Reason about data, concepts, information
– e.g. Pellet
Extract structured information (machine learning)
– e.g. Live Labs entity extraction
(http://labs.live.com/Entity+Extraction.aspx)
Today…
Computers are
great tools in
storing
computing
managing
indexing
huge amounts
of data
For example, Google and Microsoft both have copies of the Web
for indexing purposes
Tomorrow…
Computers are
great tools in
We would like
computers to also
help with the
automatic
storing
computing
managing
indexing
acquisition
discovery
aggregation
organization
correlation
analysis
interpretation
inference
huge amounts
of data
of the world’s
information
RESEARCH-OUTPUT
REPOSITORIES
Background
• Traditional research output = Journal articles
– Pros: peer-review, indexed, archived
– Cons: timeliness, cost, access, format limits
• Response = Digital Repositories
– Subject Repositories
• arXiv.org (Physics, Math, CS)
• PubMed Central (Biomedical)
– Institutional Repositories
– Data Type repositories
• Data sets, presentations, workflows, etc.
The Information Lifecycle
Research & Analysis
• Research Information Center
• Researcher Desktop
Storage & Archiving
Authoring
• Research Output Repository
•
•
•
•
Article Authoring Add-in for Word
Creative Commons Office Add-in
Semantic Annotations in Word
Reproducible Research in Word
Publication & Dissemination
• eJournal Service
• Conference Management Tool
Famulus
A platform for building services and tools for
research output repositories
• Papers, Videos, Presentations, Lectures,
References, Data, Code, etc.
• Relationships between stored entities
UIs
Desktop
Tools
Search
Famulus
platform
Goals
• Enable a tools and services ecosystem for
“research output” repositories on MS
technologies
Interop
Syndication
Services – Interoperability as one of the primary goals
• Modeling
– RDFs – RDF Schema
• Syndication and Re-Use
– RSS/Atom
– OAI-PMH – Protocol for Metadata Harvesting
– OAI-ORE – Object Re-Use and Exchange
• Ingest & publishing protocols
– SWORD – Simple Web-service Offering Repository
Deposit
– AtomPub
– BibTex
Famulus architecture goals
•
•
•
•
•
Goals
Create a platform for building
“research output” repositories
Engage with the digital library and
scholarly communications
community
Support an ecosystem of services and
tools
Available to the community for free
(we are still considering the open
source route)
Build an easy-to-install collection of
basic services and tools
•
•
•
Non-goals
A generic platform for asset
management
Support the lifecycle of publications
Compete with existing repository
solutions
3rd-party services, tools,
applications
Famulus services, Web,
interoperability
Famulus Platform
(Based on the Entity Framework + Data model)
SQL Server 2008, MS data storage technologies,
Entity Framework runtime, .NET 3.5, LINQ
Research Output Repository Platform
• A Semantic Computing platform
• A hybrid between a relational database and a triple store
Triple stores
-Evolution friendly
-Poor performance
-No need to model everything in advance
-Semantic interpretation at the application level
Relational schema
-Evolution not so easy
-Great opportunities for optimization
-Model everything in advance
Famulus Platform
-Maintain a balance
-Try to model the frequently used entities in our app domain
-Try to capture the frequently used relationships
-Allow for extensibility (Relationships, Properties)
An intuitive programming experience
Person tony = new Person();
Publication pub1 = new Publication();
pub1.Title = "Title1";
Publication pub2 = new Publication();
pub2.Title = "Title2";
pub1.Cites.Add(pub2);
pub1.Authors.Add(tony);
Tag tag = new Tag();
tag.Name = "keyword";
pub1.Tags.Add(tag);
Famulus Platfomr
PDF file
contains
is representation of
Lecture on
2/19/2008
PowerPoint
presentation
authored by
organized by
presented by
tony
Elizabeth, Sebastien,
Matthew, Norman,
Brian, Sarah, George, Roy
Data Mesh
Release Roadmap
• Customer Technology Preview has been released
– Requires SQL Sever 2008 (Express)
• Public beta January-February 09 timeframe
• RTM ???
DEMO
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Download