CRIS&OAR for research information management

advertisement
CRIS&OAR for Research
Information Management
I.Filozova
JINR LIT,
University “DUBNA”
Dubna, Russia
SCHOOL ON JINR/CERN GRID AND ADVANCED INFORMATION SYSTEMS
Dubna
NOVEMBER 2-6, 2015
Acronyms
CRIS&OAR
CRIS — Current Research Information
System
OAR — Open Access Repository
[http://jds-test3.jinr.ru]
Mission of scientific organization:
achievement scientific results, the satisfaction of the
scientific community
New Knowledge Generation
Scientific Activity
Search for
Available
Information
Data Processing
&
Data Generation
Knowledge
Generation
New Knowledge Generation
Knowledge is fixed in images and signs of the natural and artificial languages.
Scientific Activity
Search for
Available
Information
Data Processing
&
Data Generation
Publications:
• printed articles
Knowledge
• digital archives
Generation
• repositories
Tables
Plots
Data Bases
etc
Journal Crisis
end of the '90s:
The cost of subscription to scientific journals
has grown 2-3 times faster than the growth rate
of the budgets of academic libraries and
inflation.
Price policy
 1 year cost ≥ 500 $
 The average cost of an annual subscription to the
Chemistry Journal ≥ 3000 $
 some journals ≥ 10 000 $
Journal
Publisher
Year
Price $
Journal of Comp. and
Applied Mathematics
Applied Mathematics and
Mechanics (6 issues)
Elsevier
2008
4727
Springer
2016
5 606
Applied Physics A
Springer
2008
4989
Journal of Fluid Mechanics
Cambridge
Univ. Press
2008
3200
Annals of Physics
Elsevier
2016
3 928
Biochimica & Biophysica
Acta
Elsevier
2012
20 930
Materials Science & Engineering
A, B, C, & R
2008: 17,986 $
2016: 23 345 $
20 385 $
2015 Volkswagen Golf 1.6 AT
new
3 850 $
Machu Picchu
Open Access (OA) to Research
What about copyrights?
• does not cancel the copyright and does not contradict
it;
How is OA realized?
• public scientific archives and repositories — Green
road
• publication in open access journals — Gold road
Where does OA idea come from?
1. Budapest Declaration Open Access Initiative
(http://www.budapestopenaccessinitiative.org/);
2. Berlin Declaration on Open Access to Knowledge in
the Sciences and Humanities
(http://openaccess.mpg.de/Berlin-Declaration).
Open Access Benefits
Scientists and Researchers:
•
•
•
•
expansion readership and increasing readability;
increasing publication citation;
scientific impact;
growth of the author popularity and fastening of a scientific
priority.
Organization:
• management of their digital resources;
• increasing the scientific prestige of the organization.
Society:
• return on investment in research;
• removing barriers to information sharing;
• creation of additional information services for different users
categories.
OAI-Protocol for Metadata Harvesting
BASIS
SUPERSTRUCTURE
OAI-PMH
2 types of requests:
1. SELECT ALL RECORDS;
2. SELECT RECORDS WHERE <criteria>
6 commands:
GetRecord, Identify, ListIdentifier,
ListMetadataFormats, ListRecords,
ListSets
HTTP
Information Model OAI-PHM
RESOURCE ↔ ELEMENT {ID_RECORD; RECORDS}
RESOURCE
IDENTIFIER
Dublin
Core
METADATA SETS
MARC21
MARCXML
RECORDS
User
Metadata
Set
...
OAI Repositories over the World
Archives












USA
UK
Germ.
Japan
Spain
Brazil
India
China
France
Canada
Ukraine
Australia
693
231
199
156
156
136
102
90
87
81
73
75
Number of Repositories — 4053
Number of Records ~ 39,000,000











Italy
Taiwan
Russia
Portugal
Colombia
Sweden
S.Africa
Malaysia
Nether
Belgium
Greece
Archives
77
69
53
48
47
45
40
36
35
28
21
according to the Registry of Open Access Repositories
ROAR – http://roar.eprints.org
Open Access Statistics
Repository type
Software to create and manage OARs
Software
DSpace
EPrints
Bepress
OPUS
Invenio 
Greenstone
Fedora
Number of
repositories
1579
567
366
72
19
22
57
OAR Example 1
OAR Example 2
JINR Document Server ̶ http://jds.jinr.ru/
Research Information
Data/Metadata or Information about:
•
•
•
•
•
•
•
•
•
Scientists
Project Managers
Ongoing and Completed Projects
Research Departments
Funding Organisations and Programmes
Research Results
Publications
Equipment
their timely Relationships (Semantics)
Who needs Research Information?
What is a CRIS?
Current Research Information System = CRIS
… that means
• Timeliness
• Vitality
… driven by
… information about
• People +
• Organisations +
• Projects +
• Funding Programmes +
• Research Results +
•…
• A Concept
• A Model
… incorporated as a
• Implementation (ICT)
An integrated approach towards managing research information
CERIF Model
Common European Research Information Format
Instance Diagram
HR System
webpages
OrgUnit M
member
Part of
employee
Person A
OrgUnit O
member
Project leader
Project P
Project
Finance Management
webpages
OrgUnit N
Part of
author
owns IPR
Publication X
Repository
CERIF Features
(1) data model (data-centric)
(2) allows for a (metadata) representation of
–research entities
–their activities / interconnections (research)
–their output (results)
(3) allows for high flexibility with formal (semantic)
relationships
(4) enables quality maintenance, archiving, access and
interchange of research information
(5) supports knowledge transfer to decision makers,
for research evaluation, research managers, strategists,
researchers, editors, the general public
CRIS Example 1
CRIS Example 2
ИСТИНА (https://istina.msu.ru/)
CRIS Example 3
Personal INformation System JINR
PIN
CRIS&OAR Challenge
Collaboration of researchers, administration
and librarians
CRIS and OARs should join forces to deliver
the best possible services
Operational Layer
Strategic Layer
Current Research Information Systems (CRIS)
& Open Access Repositories (OAR)





Commonalities:





Managment:
 Financial information
 Staff information
 R&D organisation
administrative
comprehensive
integrative
person-centric
analytics
public
file-centric
rights
preservation
distributed
paradigm
CRIS



Bibliographic Information
Affiliation
Project Information
OAR
Managment:
 Bibliographic Data
 Full-Text Documents
 Authoritative Data Resources
Aggregative Approach
– Integrating with institutional HRM, project a.o. systems:
Sharing and re-using resources
 Record the R&D
(Research and
Development) activity
 Cover projects, people
(expertise), organizational
structure, R&D outputs,
events, facilities and
equipment
 Collect and preservate the
R&D outputs
 Services Set for the
collaboration members
to manage and distribute
digital resources.
Need Curation Processes & Human Responsibilities
People
staff manager
P
Projects
P
research project
manager
Materials U
&
Equipment
facility manager
B
Bibliographic
Information
bibliography specialist,
librarian,
content manager,
identity manager
Curation
View
F
Finance
financial officer
Normalize as much as possible:
Authority Records*
*search elements of bibliographic records
+ More qualitative, consistent data
+ Minimizing the data input by end-users
Authority Control
identify objects and concepts uniquely
Authorities
Variety
People, Institutes,
Grants, Experiments,
Projects, Journals, …
Identifiers
Variety
DOI,
ORCID,
...
Linkages
Variety
n:m relations,
Vertical linkages,
Horizontal linkages
History
Tracking
Predecessors/Successors
Authority Control
Result
Tool
Source Data
 CRIS & OAR
Systems
 Bibliographic
Databases
 Vocabularies,
Ontologies,
ORCID/AuthorClaim
a.o. authors‘
identifying systems
Authority Control
1.
Accounting of all
name variants
2. Authoritative data
disambiguation
in information
search, submission
Relevant
Information about
R&D activity
 Lists of Publications
 Scientific Reporting
 Bibliometrics &
Scientometrics
JINR CRIS & OAR Systems
JDS
JINR Document
Server
Open Access
Repository of
materials concerning
the R&D activity
PIN
IDC
Personal
Information System
Integrated Digital
Conferencing System
Staff information:
 Employment profiles
 Bibliographic Archive
 Projects’ Information
Scientific activities
management:
entire lifecycle for
conferences,
meetings, lectures
Invenio, ©CERN
©JINR
Indico, ©CERN
from file
from person
from event
Viewpoint
Jinr Document Server (JDS)
JDS has created and developed as an
institutional repository with following content:
1. The research and scientific-related documents:
– Publications issued in coauthorship with JINR
researchers;
– Archive documents that describe all the essential
stages of the JINR research activity;
2. Documents providing informational support for
scientific and technological research performed in
JINR.
JDS: Information Services
•
•
•
•
•
•
•
Search and navigation,
Creation of the user’s groups,
Saving search results,
Individual and group bookshelves,
Manuscripts deposition,
Discussions on the publications,
Sending out alerts and messages.
Invenio SOFT
• Unix-like OS - GNU/Linux distributions
Debian, Gentoo, Scientific Linux (RHELbased), Ubuntu
• HTML,CSS,JS
• Python 2.7.5+
• MySQL
• Redis
Architecture
http://jds.jinr.ru
Trees
Collections
Subcollections
Collection Books
Information Card of Resource
Attachment to
Collection
Authority Control Realization
Solved by: MARC21 Authorities + Invenio v1.2.1 API
MARC21 authorities





Repeatable linking fields (fields 4xx, 5xx)
Horizontal linking (subfield $w: $wa - predecessor, $wb- successor)
Vertical linking (subfield $w: $wt - parent)
Repeatable System Control Number (field 035)
Repeatable Standard Technical Report Number (field 027)
Module BibAuthority
 Enriching of bibliographic data with data from authority records
 Re-indexing of bibliographic records containing links to recently updated
authority records
 Cross-referencing between MARC records($0 subfields)
Collection Authorities
http://jds-test3.jinr.ru
Collection Institutes.
Record JINR
Record LIT. Detailed Information
Institute →Publication
Collection People.
Author → Publication
Detailed Information about Author
Code Collection - MARC tag 980
defines which documents
belong to the given collection
Experiment → Publication
Grant → Author → Publication
Thesaurus
Repository — place for storage and support any data.
Archive — collection of the information resources +
classification system (catalog).
Knowledge — a existence and systematization form of the
results of human cognitive activity.
Knowledge (the subject) — the confident understanding of
a subject, the ability to deal with it, to understand it and use
to achieve some goals.
Missing knowledge — knowledge known for humanity, but
unknown to some person at the current moment (for example, the
student and new subject of the educational program).
Knowledge in the wide meaning — a subjective image of
reality in the form of concepts and ideas.
Knowledge in the narrow meaning — the possession of
verified information (answers to questions), that allows to
solve the challenge.
Knowledge in the theory of artificial intelligence (AI)
and expert systems — an information and inference rules
about the world, objects properties, patterns of processes and
phenomena, as well as the rules for the usage of them for
decision-making.
New knowledge — an information about the existence of any
objects or their properties, of the real processes and
phenomena, unknown for science previously, and not
included in the current existing system of human
representations about the world.
Open Access (OA) to Research — way of the scientific
communication by realization of the author right of the
product on publication in such a manner that any person can
get access to product from any place and at any time at an own
choice.
Open Archives Initiative (OAI) — an organization to
develop and apply technical interoperability standards for
archives to share catalog information (metadata).
Self-archiving — a deposition the digital documents
(metadata + full-text) in a OAI-compliant Archive.
“Proxy” self-archiving — a deposition on behalf of any
authors who feel that they are personally unable (too busy or
technically incapable) to self-archive for themselves.
Harvesting — automatic metadata gathering between
repositories.
OAI-PHM — Open Archives Initiative Protocol for Metadata
Harvesting.
Metadata — structured data which describes the
characteristics of a resource (“An Introduction to Metadata”, by Chris Taylor,
University of Queensland)
Data about Data
Book:
Title: Pushkin's Fairy Tales
Date of Publication: 2012
Author: Alexander Pushkin
Editor: Williams Paul
Translator: Elton Oliver,
Krup Jacob
Publisher: Bright City
Structure:
• Type of Resource
• Title
• Description
• Source
• Date
• Author
• Creator
•…
MARC21 — international standard for bibliographic data.
A MARC bibliographic record consists of three main components: the Leader, the Directory, and
the variable fields (http://www.loc.gov/marc/bibliographic/).
00X: Control Fields
01X-09X: Numbers and Code Fields
1XX: Main Entry Fields
20X-24X: Title and Title-Related Fields
25X-28X: Edition, Imprint, Etc. Fields
3XX: Physical Description, Etc. Fields
4XX: Series Statement Fields
5XX: Note Fields
6XX: Subject Access Fields
70X-75X: Added Entry Fields
76X-78X: Linking Entry Fields
80X-83X: Series Added Entry Fields
841-88X: Holdings, Location, Alternate Graphics, Etc. Fields
MAchine-Readable Cataloguing
035 - System Control Number (Repeatable)
100 - Personal Name (Not Repeatable)
245 - Title Statement (Not Repeatable)
SubFields
Fields
700 – Add Entry Personal Name
(Not Repeatable)
SubFields
Values
XML — EXtensible Markup Language, metalanguage (language
for description of other languages), universal format for structured
documents and data (derived from SGML - Standard
Generalized Markup Language) http://www.w3.org/XML/
Example:
Root Element
<?xml version="1.0" encoding="utf-8"?> ]<->Prolog
<PRODUCTS>
<PRODUCT>
<TITLE>
<PRICE>
</PRODUCT>
<PRODUCT>
<TITLE>
<PRICE>
</PRODUCT>
</PRODUCTS>
Opening Tag
Element Content
Product #1 </TITLE>
10.00 </PRICE>
Product #2 </TITLE>
20.00 </PRICE>
Closing Tag
MARCXML — a framework for working with MARC data in a
XML environment (http://www.loc.gov/standards/marcxml/)
Tag datafield = MARC field
Tag subfield = MARC subfield
Element Content = MARC subfield values
Institutional Repository
Open Access
Idea
Digital Libraries
Tools
Scientific and
Educational Activity
Institutional Repositories
in the form of Open Access
I. Digital Collection. Collection and preservation
of intellectual output of organization.
II. Set of services for the collaboration members in
order to manage and distribute digital resources.
CERIF — Common European Research Information Format
1)
CERIF is an EU Recommendation to Member States
(http://cordis.europa.eu/cerif/ )
2) The European Commission (EC) has authorised euroCRIS to
maintain and develop CERIF and its usage
(http://www.eurocris.org/cerif/cerif-releases/ )
Download