1 DRIS development concept Status: position paper of DRIS/BP TG euroCRIS, initiated by Sergey Parinov in January 2012 Editors: Sergey Parinov, Barbara Ebert, Keith Jeffery, … Last release date: 2012.01.31 Table of Contents Introduction .................................................................................................................................................. 2 DRIS development diagram........................................................................................................................... 3 Summary table of DRIS development tasks .................................................................................................. 3 DRIS development stages.............................................................................................................................. 4 1. Basic DRIS .............................................................................................................................................. 4 Proposed initial tasks: ........................................................................................................................... 4 2. DRIS as a federation of CRIS related information objects ..................................................................... 5 2.1. Proposed initial tasks: .................................................................................................................... 5 2.2. DRIS Extensions: ............................................................................................................................. 5 3. DRIS as an Information Hub .................................................................................................................. 6 Proposed initial tasks: ........................................................................................................................... 7 3.1. Information Hub instances: a list of URLs or a database ............................................................... 7 3.2. Information Hub input/output protocols ....................................................................................... 7 3.3. Conversion utility ........................................................................................................................... 7 3.5. Information Hub content visualization: Pan European research portal ........................................ 8 4. DRIS + SLOR ........................................................................................................................................... 8 Proposed initial tasks: ........................................................................................................................... 8 4.1. An idea of semantic linking tool ..................................................................................................... 9 4.2. Semantic vocabularies.................................................................................................................... 9 4.3. Proposed SLOR design .................................................................................................................... 9 4.4. SLOR benefits ...............................................................................................................................10 5. Extended DRIS as platform of research DIS and research e-infrastructure. .......................................10 Proposed initial tasks: .........................................................................................................................11 Summary table of expected benefits from extended DRIS .........................................................................11 2 Introduction We are starting DRIS (Directory of Research Information Systems) development from a collection of descriptions for already implemented CRIS. Then we open a registration of CRIS artifacts connected with DRIS records of parent CRIS. To register the "content" type artifacts we will use CERIF list of entities (Projects, Persons, Organizations, Funding, Outputs {products, patents, and publications}, Events, Facilities, Equipment, Services, etc.). The second type of the CRIS artifacts is "software". In cooperation with Architecture TG we will make a list of classes of this type of artifacts. As a result CRIS community will have a catalogue of information about existed CRIS with details about its content and used software. The catalogue can be updated and expanded in decentralized mode. Next we extend the content type of CRIS artifact specification by adding fields for harvesting gateway's parameters (URL, protocol and available formats). Stored at DRIS harvesting gateways data will be rearranged as an information hub subsystem, which is open for input (any organization can register content artifacts and their harvesting gateway parameters) and for output (any user or a software robot can take free at DRIS the gateways' parameters to harvest open metadata from their source CRIS). DRIS information hub will provide a CERIF-XML interoperation mechanism and allows a multiple reuse of standardized metadata outside DRIS. As a human navigation and searching interface to the DRIS information hub we can visualize integrated metadata as a Pan European research portal. This development from the basic DRIS to the Pan European research portal with optimistic view can take about 2-3 years. And we have very perspective lines for the further DRIS development. When we have at DRIS some artifacts collections (content and software types) we can provide for euroCRIS members or wider – for international CRIS community – a tool to use the artifacts and operate with them in some innovative forms. E.g. all available artifacts can be using as a building kit for expressing ideas, for designing logical/structural models and making proposals of new CRIS content and services. Thus proposals with assigned intellectual rights of its authors can be publicly exposed for implementing by some developer's organization or as a CRIS community project on a base of have existed artifacts. Technically thus logical/structural modeling that used the DRIS artifacts can be supported by a tool for semantic linking of selected artifacts. Where the semantic is carrying a sense and explanations about an author's idea, but the linkages express a scope and a structure of involved artifacts. Since DRIS is based on a federation principle (data can be changed by only its owner or authorized person), the linkages data should be stored outside the linking objects. To support semantic linkages between DRIS artifacts we should create a special system. Initially we called it SLOR (Semantic Linkages Open Repository). However such tool can be implemented differently, e.g. it can exist as a subsystem of a CRIS. Members of CRIS community can express their creativity ideas by linking semantically DRIS artifact. Created linkages are stored at SLOR. DRIS information system makes a request to SLOR when a user opens some DRIS object for viewing. DRIS gets back from SLOR the data about existed linkages for the displayed object and can visualize all ingoing/outgoing linkages at the object's web page. If we have both: (1) DRIS information hub as a virtually integrated source of research content, and (2) a tool to express ideas about reuse and development of the content and software artifacts belonged of some CRIS, we can establish a virtual research environment where all free available for harvesting content's metadata will be visualized like research Data and Information Space (DIS) and all proposed tools and services can be connected with DIS objects using some research e-infrastructure. By this way 3 we get a reincarnation of the mentioned above Pan European research portal, but in a more useful form – as a virtual research environment. The proposed DRIS development concept can be presented as following steps: (1) Basic DRIS -> (2) DRIS+CRIS artifacts -> (3) DRIS information hub (Pan European research portal) -> (4) DRIS+SLOR -> (5) Research DIS -> (6) Research e-Infrastructure -> (7) Virtual Research Environment DRIS development diagram Basic DRIS CRIS artifacts database DRIS + CRIS artifacts DRIS information hub+ Pan European portal Sem. Linkages Open Rep., DRIS+SLOR Research DIS+ e-Infrastructure Summary table of DRIS development tasks DRIS development stages 1. Basic DRIS Visualization of already implemented CRIS Proposed initial tasks / for discussion with TG members (1) To discuss necessary fields to register CRIS at the basic DRIS, see the current input form - http://www.eurocris.org/DRIS_Form.php (2) To determine what end-user interfaces should has the DRIS database (to edit records, to search and display records) (3) To discuss euroCRIS and/or CRIS community strategic goals related with DRIS and how to make DRIS useful (4) To have a list of possible CRIS artifacts types/classes (in cooperation with CERIF TG and Architecture TG). Initially proposed version of the CRIS 2. DRIS as a federation of CRIS related information artifacts types/classes is listed below and is published at the forum. objects Registration of CRIS components (artifacts) for professional multiple re-use (5) To define a set of fields for CRIS artifacts description. Matching a format and a structure of this set with CERIF entities (cfResProduct?). Adding to the set some special fields to specify harvesting/usage parameters of the artifact. What kind of usage can exist for "software" type artifacts and how it should be specified at the input form? (6) To decide on practical implementation of the CRIS artifacts database (at euroCRIS web site or outside) and on forms of its integration with DRIS database and web interfaces. (7) To set rules for users of DRIS+CRIS artifacts system. If thus extended DRIS is a federation of information objects, how can not owners connect artifacts with DRIS records? 4 3. DRIS as an Information Hub Involvement of the "content" type artifacts into professional re-use 4. Extended DRIS + SLOR Involvement of all types artifacts into professional re-use 5. Extended DRIS as platform for Pan European research DIS and research einfrastructure. Creation of a Virtual Research Environment (8) A discussion of the DRIS IH's CERIF-XML interoperation mechanism, main features and added value (9) Technical and information architecture of the DRIS IH and its main utilities (10) Information Hub's input/output protocols (enhanced OAI-PMH, RSS, etc.) (11) Visualization of the DRIS IH content as Pan European research portal (12) A new model to operate with CRIS artifacts: connecting them by semantic linkages. Can it make DRIS more useful and create added values for the CRIS community? (13) What semantic vocabularies will be needed to express possible classes of relationships between information objects, including DRIS records, CRIS artifacts and related objects of external information systems? What kinds of relationships make sense to express and visualize? (14) Rational models of semantic linking instrument’s design and practical implementation: as a SLOR or as a CRIS. (15) Extended DRIS as a platform for modeling and designing new CRIS artifacts. A virtual environment for professional interactions among members of CRIS community, including tools for expressing ideas, for designing logical/structural models and making proposals of new resources and services. (16) A model of Pan European research DIS as a combination of (1) the content from the CERIF integrated research metadata provided by DRIS Information Hub and (2) the tools/services connected with DIS content and provided by decentralized developers. (17) A model of research e-infrastructure as a mechanism to connect designed tools and services with the DIS content. (18) To open a richness of CERIF data within DIS for external information systems (indexing systems, linking systems, ect.). E.g. using Schema.org approach and so on. DRIS development stages 1. Basic DRIS Basic DRIS main functions are: (1) to register descriptions of already implemented CRIS at DRIS database and (2) to visualize this data as a catalogue at euroCRIS web site. Proposed initial tasks: 1. To discuss necessary fields to register CRIS at the basic DRIS, see the current input form http://www.eurocris.org/DRIS_Form.php 2. To determine what end-user interfaces should has the DRIS database (to edit records, to search and display records) 3. To discuss euroCRIS and/or CRIS community strategic goals related with DRIS and how to make DRIS more useful 5 2. DRIS as a federation of CRIS related information objects As the first step to more useful DRIS we propose to build a registration of CRIS components (artifacts) as separate information objects. A CRIS artifact is a part of a CRIS content or software that can be reused in some form by CRIS community members. Registering CRIS artifacts at DRIS means that the artifacts are involving into professional circulation and reuse under certain terms. 2.1. Proposed initial tasks: 4. To have a list of possible CRIS artifacts types/classes (in cooperation with CERIF TG and Architecture TG). Initially proposed version of the CRIS artifacts types/classes is listed below and is published at the forum. 5. To define a set of fields for CRIS artifacts description. Matching a format and a structure of this set with CERIF entities (cfResProduct?). Adding to the set some special fields to specify harvesting/usage parameters of the artifact. What kind of usage can exist for "software" type artifacts and how it should be specified at the input form? 6. To decide on practical implementation of the CRIS artifacts database (at euroCRIS web site or outside) and on forms of its integration with DRIS database and web interfaces. 7. To set up rules for users of DRIS + CRIS artifacts system. If thus extended DRIS is a federation of information objects, how can not owners connect artifacts with DRIS records? 2.2. DRIS Extensions: 2.2.1. The artifacts types: a "content" type by a list of CERIF entities o Projects o Persons (researchers) o Organisations (Institutes, Departments...) o Publications o Other types of materials o Products o Patents o Events o Other types of activity o Facilities (e.g. large laser laboratory) o Equipment (e.g. x-ray spectrometer) a "software" type by typical modules of CRIS o Navigation o Searching o Visualization o DIS (data and information space) o Personalization o Filtration o Selection o Harvesting o Monitoring, tracing of DIS changes o Scientometrics o etc. 6 2.2.2. CERIF ResultProduct entity as a model and a template to specify CRIS artifacts. A table of proposed main fields with data as an example cfResultProduct_Classification cfResultProductName cfResultProductDescription (general description) cfResultProductKeywords cfResultProduct_ResultProduct OAI-PMH harvesting gateway URL OAI-PMH Description of output data (supported formats, language, etc.) OAI-PMH Usage terms RSS harvesting gateway URL RSS Description of output data (supported formats, etc.) RSS Usage terms content | person Personal profiles of CEMI RAS staff A collection of staff's personal profiles of Central Economics and Mathematics Institute of Russian Academy of Sciences. The profiles include some linkages, e.g. with people's publications. The collection can be harvested by RSS and/or OAI-PMH protocols in DC or CERIF formats. personale profiles, CEMI RAS, social science Socionet CRIS http://cemi.socionet.ru/oai/ecoorg_org1/oai.cgi? verb=ListRecords&metadataPrefix=cerif&set=person_ekonomika_rus_d oahtw Formats: cerif, oai_dc and some other. Number of records = 106 (on 2012.01.26). All personal data in Russian. Persons' names are also in English. Any usages of this collection have to include a link to the source collection. http://socionet.ru/cgi/xml/collection.cgi?h=repec:rus:doahtw&rss=srss1. 0 Formats: socionet. Number of records = 106 (on 2012.01.26). All personal data in Russian. Persons' names are also in English. Any usages have to provide a link to the source collection. 2.2.3. CERIF entities to connect artifacts with description of parent CRIS at DRIS and other related objects: cfResultProduct_Funding cfResultProduct_ResultProduct cfResProduct_Service cfResProduct_Equipment cfResProduct_Facility cfResultProduct_Medium cfResultProduct_Measurement cfResultProduct_Indicator 2.2.4. Examples A form to register all types of CRIS artifacts (it needs a login at eurocris.socionet.ru) http://eurocris.socionet.ru/DRIS-CERIF/Lists/CRIS%20artifacts%20collection/NewForm.aspx? A view of registered artifact (public access) http://eurocris.socionet.ru/DRIS-CERIF/Lists/CRIS%20artifacts%20collection/DispForm.aspx?ID=1 3. DRIS as an Information Hub We have within a CRIS artifact specification the fields for harvesting gateway's parameters (URL, protocol and available formats). Stored at DRIS harvesting gateways data can be rearranged as an Information Hub (IH) subsystem. IH is open for input (any organization can register content artifacts and their harvesting gateway parameters) and for output (any user or a software agent can take free at DRIS the gateways' parameters to harvest open metadata from the source CRIS). DRIS information hub will provide a CERIF-XML interoperation mechanism and allows a multiple reuse of standardized metadata outside DRIS. 7 Proposed initial tasks: 8. A discussion of the DRIS IH's CERIF-XML interoperation mechanism, main features and added value 9. Technical and information architecture of the DRIS IH and its main utilities 10. Information Hub's input/output protocols (enhanced OAI-PMH, RSS, etc.) 11. Visualization of the DRIS IH content as Pan European research portal 3.1. Information Hub instances: a list of URLs or a database The simplest version of the IH can exist just as a list of URLs filtered from extended DRIS with harvesting parameters. But thus IH can not guarantee a proper work of CERIF-XML interoperation mechanism. So the further DRIS development demands a more complicated IH design. IH should keep integrated content in a database. A software that operates as IH input utility takes in automated mode all available metadata according the list of URLs mentioned above as the simplest IH. The collected metadata can be improved on a "fly" to match the CERIF data model and can be stored in some database. The IH database 100% CERIF compatible content should be regularly synchronized (everyday) with the content of the source CRIS artifacts. Another software that operates as HI output should allow a retrieving of the content by some query language. Full IH content should be visualized as e.g. Pan European research portal. 3.2. Information Hub input/output protocols The input/output protocols should allow a retrieving of collected metadata according two main structures: (1) organizational hierarchy, and (2) scientific hierarchy. (1) community (retrieve all metadata belonged to some community or a group of organizations) -> organization (retrieve metadata belonged to some organization) -> section (data type) (retrieve all metadata belonged to some data type, e.g. all persons) -> collection (retrieve all metadata belonged to some collection) -> object (retrieve a metadata belonged to some object) (2) scientific discipline -> scientific theme (line) -> section (data type) -> collection -> object If we use the popular OAI-PMH and/or RSS protocols "as is", we are not able to make retrieving according listed hierarchical structures. The TG should propose a way to enhance existed popular protocols or should design a special protocol for input/output operations with DRIS IH. 3.3. Conversion utility 8 Some organizations can have CRIS artifacts with storage models that are not CERIF-compatible. However, such organizations can use a conversion utility to change (e.g. on a "fly") their CRIS artifacts' format to CERIF. The conversion utility can work on the side of organization (it can work behind the harvesting URL), as well, as on the side of DRIS IH (a harvesting metadata from the organization will be processed first by the conversion utility at IH input). So within IH this metadata appears as a pure CERIF with respect to queries, responses and information exchange. 3.5. Information Hub content visualization: Pan European research portal Initially DRIS is required to improve visibility of already implemented CRIS, but when DRIS IH is integrating metadata of many CRIS, it can also work as a form the technical underpinning for the operation of the research information portal. The DRIS IH is designed for communication with software agents (gateways and harvesters), but thus Pan European research portal can provide a human navigation and search interface to the integrated metadata. 4. DRIS + SLOR SLOR (Semantic Linkages Open Repository) as a tool to create semantic linkages between information objects belonged to DRIS (CRIS descriptions, artifacts) and also to external information systems (web pages of person and organization profiles, articles of the Best Practice catalogue and other materials on the web). If we have at DRIS some artifacts collections (content and software types) we can provide for euroCRIS members or wider – for international CRIS community – a tool to use the artifacts and operate with them in some innovative forms. E.g. all available artifacts can be using as a building kit for expressing ideas, for designing logical/structural models and making proposals of new CRIS content and services. Thus proposals with assigned intellectual rights of its authors can be publicly exposed for implementing by some developer's organization or as a CRIS community project on a base of have existed artifacts. Technically thus logical/structural modeling that used the DRIS artifacts can be supported by a tool for semantic linking of selected artifacts. Where the semantic is carrying a sense and explanations about an author's idea, but the linkages express a scope and a structure of involved artifacts. Since DRIS is based on a federation principle (data can be changed by only its owner or authorized person), the linkages data should be stored outside the linking objects. To support semantic linkages between DRIS artifacts we have to create a special system. It is the first initial release called SLOR (Semantic Linkages Open Repository). Members of CRIS community can express their creativity ideas by linking semantically DRIS artifact. Created linkages are stored at SLOR. DRIS information system makes a request to SLOR when a user opens some DRIS object for viewing. DRIS gets back from SLOR the data about existed linkages for the displayed object and can visualize all ingoing/outgoing linkages at the object's web page. Proposed initial tasks: 12. A new model to operate with CRIS artifacts: connecting them by semantic linkages. How it can make DRIS more useful and create added values for the CRIS community? 13. What semantic vocabularies will be needed to express possible classes of relationships between information objects, including DRIS records, CRIS artifacts and related objects of external information systems? What kinds of relationships make sense to express and visualize? 14. Rational models of semantic linking instrument’s design and practical implementation: as a SLOR or as a CRIS. 9 15. Extended DRIS as a platform for modeling and designing new CRIS artifacts. A virtual environment for professional interactions among members of CRIS community, including tools for expressing ideas, for designing logical/structural models and making proposals of new resources and services. Below we provide some SLOR technical details, but as a more general system than semantic linking of DRIS information objects, including CRIS artifacts. It is described as a tool to create and manage semantic linkages between any information objects of research DIS, where DRIS objects is just a subset with specific semantic vocabularies. 4.1. An idea of semantic linking tool We can design a tool for researchers to express their opinions about existed relationships between DIS objects in visual and computer-readable forms. It can be realized as a specific open repository, which will allow any researcher to create semantic linkages caring by semantic a relationship data between DIS objects, to store, manage and accumulate semantic linkages, to provide navigation and searching tools over a set of accumulated linkages. This repository should have some API, e.g. to provide data about linkages on request to external CRIS for visualization of a network of linkages composed of articles and other information objects belong to this CRIS. And the repository of course should have advanced interoperability features to exchange linkages data with other repositories and CRIS. 4.2. Semantic vocabularies In designing this open repository there is a challenge with providing complete and proper linkages' semantic for covering all types of relationships that scientists, including CRIS community, can wish to express for linking objects of research DIS. A background for creation necessary semantics vocabularies includes: a semantic section of CERIF (Common European Research Information Format), recommendations of W3 - SKOS (Simple Knowledge Organization System), SWAN (Semantic Web Applications in Neuromedicine), SPAR (Semantic Publishing and Referencing Ontologies) and especially its parts CiTO (Citation Typing Ontology) and DoCo (Document Components Ontology). As the initial release we propose following types of scientific relationships and associated semantic vocabularies (source of semantics are in brackets): • Inference (CiTO): obtain background from, updates, used as evidence, confirms, qualifies • Impact/usage (CiTO): contains assertion from, uses data from, uses method from, corrects, refutes • Hierarchical and associative (SKOS, SWAN): broader, narrower, related, alternative to • Components of scientific composition (DoCo): duplicate, revised, etc. There are also other types of relationships, like "professional opinions", "personal-organizational relationships" and so on. 4.3. Proposed SLOR design Semantic Linkages Open Repository (SLOR) is a tool to expand current research DIS by a new type of objects: "linkage". The "linkage" data type is designed to carry out the subjective opinions of scientists about the relationship that exists between any pairs of DIS objects, including "person", "organization", "project", "research result", "event", "artifact" and some others. Rendered scientific relationships include: (1) relationships between the various research and development (R&D) outputs like inference, usage, impact, comparison, etc; (2) relationships between elements of the set {scientists, organizations}; (3) relationships between R&D outputs on the one hand and elements of the set {scientists, organization} on the other. 10 The "linkage" data type is based on CERIF Link entity specification. It allows for the specification of a source and a target objects by their IDs. Thus, all linkages are oriented from a source object to a target one. A pair of objects IDs is complemented by semantic value, which characterizes a type of relationship between objects. Possible semantic values are organized as a set of semantic vocabularies according types of scientific relationships. For practical using in SLOR we extended the initial CERIF Link entity model by adding: • • • the linkage's ID, since the linkage exists as a research DIS object and has to have unique ID; a field for comments; the creator of the linkage. SLOR openness means: a) it is free to use, i.e. any scientist can use it to create semantic linkages between any available objects of research DIS (proposals are moderated); b) all semantic linkages in SLOR are open for harvesting and external using by other research information systems; c) openness of multiple semantic vocabularies for replenishing and development (proposals are moderated); d) DIS data types, which objects can be linked in SLOR can be expanded. SLOR functionality includes a personal zone, a public portal for navigation and searching over accumulated linkages, and some other services (information hub, monitoring of linkages' changes, sending notifications, building scientometrics, etc.). For better SLOR navigation and searching the IDs of source and target objects are supplemented in the repository by name/title and data type of the objects. To get this information in an automated mode from the DIS we assume that source/target objects metadata are available through RSS or OAI-PMH protocols and have CERIF, ReDIF or some other popular formats. Any external CRIS can in an automated mode check presence in SLOR of linkages for own information objects. If positive, the linkages' data can be harvested from SLOR to the CRIS using API. So external CRIS can visualize a network of linkages composed of articles and other information objects belong to this CRIS. 4.4. SLOR benefits With SLOR researchers have a new tool and new professional dimension for scientific creativity as well. Additionally to traditional way of scientific work now they can express new scientific knowledge about relationships between separate research results by building multilayer networks of semantic linkages. Scientists can create now their research publications in a style of semantic networks. E.g. by linking separate research objects, "units of thought" or other types of research outputs, which can belong to different authors. As a result, SLOR improves a scientific circulation mechanism and research outputs (nodes of semantic networks) can be easily reused by the research community. SLOR notifies scientists about linking/using their research outputs, as well about changes in research objects that they linked with own outputs. It improves research communication and increase efficiency of research work at large. SLOR provides new scientometrics based on qualitative data about scientific relationships, like impact, usage and others. It can help with research assessment and evaluation and it improves the professional signaling system of the scientific community. 5. Extended DRIS as platform of research DIS and research einfrastructure. 11 If we have both: (1) DRIS information hub as a virtually integrated source of research content, and (2) a tool to express ideas about reuse and development of the content and software artifacts belonged of some CRIS, we can establish a virtual research environment where all free available for harvesting content's metadata will be visualized like research Data and Information Space (DIS) and all proposed tools and services can be connected with DIS objects using some research e-infrastructure. By this way we get a reincarnation of the mentioned above Pan European research portal, but in a more useful form – as a virtual research environment. Proposed initial tasks: 16. A model of Pan European research DIS as a combination of (1) the content from the CERIF integrated research metadata provided by DRIS Information Hub and (2) the tools/services connected with DIS content and provided by decentralized developers. 17. A model of research e-infrastructure as a mechanism to connect designed tools and services with the DIS content. 18. To open a richness of CERIF data within DIS for external information systems (indexing systems, linking systems, ect.). E.g. using Schema.org approach and so on. Summary table of expected benefits from extended DRIS DRIS development stages / Products 1. Basic DRIS Expected Benefits / for discussion with TG members A visualization of already implemented CRIS 2. DRIS as a federation of A registration of CRIS components (artifacts) for professional multiple CRIS related information re-use objects. A disclosure of technical details about content and software for CRIS registered at DRIS. Product 2: Extended DRIS + CRIS artifacts A searching and establishing of rules and models of professional re-use for CRIS artifacts in behalf of euroCRIS members and CRIS community at large. 3. DRIS as an Information Hub Product 3: DRIS IH Involvement of the "content" type artifacts into professional re-use Practical implementation and running of CERIF-XML interoperation mechanism Creation of input/output protocols for CERIF-XML interoperation mechanism. Adjustment of OAI-PMH and RSS for CERIF-XML interoperation. Running of Pan European research portal 4. Extended DRIS + SLOR Involvement of all types artifacts into professional re-use Product 4: DRIS+SLOR Extended DRIS as a platform for modeling and designing new CRIS 12 artifacts. A virtual environment for professional interactions among members of CRIS community, including tools for expressing ideas, for designing logical/structural models and making proposals of new resources and services. 5. Extended DRIS as platform for Pan European research DIS and research einfrastructure. Creation of a Virtual Research Environment