CRIS&OAR for Research Information Management I.Filozova JINR LIT, University “DUBNA” Dubna, Russia SCHOOL ON JINR/CERN GRID AND ADVANCED INFORMATION SYSTEMS Dubna NOVEMBER 2-6, 2015 Acronyms CRIS&OAR CRIS — Current Research Information System OAR — Open Access Repository [http://jds-test3.jinr.ru] Mission of scientific organization: achievement scientific results, the satisfaction of the scientific community New Knowledge Generation Scientific Activity Search for Available Information Data Processing & Data Generation Knowledge Generation New Knowledge Generation Knowledge is fixed in images and signs of the natural and artificial languages. Scientific Activity Search for Available Information Data Processing & Data Generation Publications: • printed articles Knowledge • digital archives Generation • repositories Tables Plots Data Bases etc Journal Crisis end of the '90s: The cost of subscription to scientific journals has grown 2-3 times faster than the growth rate of the budgets of academic libraries and inflation. Price policy 1 year cost ≥ 500 $ The average cost of an annual subscription to the Chemistry Journal ≥ 3000 $ some journals ≥ 10 000 $ Journal Publisher Year Price $ Journal of Comp. and Applied Mathematics Applied Mathematics and Mechanics (6 issues) Elsevier 2008 4727 Springer 2016 5 606 Applied Physics A Springer 2008 4989 Journal of Fluid Mechanics Cambridge Univ. Press 2008 3200 Annals of Physics Elsevier 2016 3 928 Biochimica & Biophysica Acta Elsevier 2012 20 930 Materials Science & Engineering A, B, C, & R 2008: 17,986 $ 2016: 23 345 $ 20 385 $ 2015 Volkswagen Golf 1.6 AT new 3 850 $ Machu Picchu Open Access (OA) to Research What about copyrights? • does not cancel the copyright and does not contradict it; How is OA realized? • public scientific archives and repositories — Green road • publication in open access journals — Gold road Where does OA idea come from? 1. Budapest Declaration Open Access Initiative (http://www.budapestopenaccessinitiative.org/); 2. Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (http://openaccess.mpg.de/Berlin-Declaration). Open Access Benefits Scientists and Researchers: • • • • expansion readership and increasing readability; increasing publication citation; scientific impact; growth of the author popularity and fastening of a scientific priority. Organization: • management of their digital resources; • increasing the scientific prestige of the organization. Society: • return on investment in research; • removing barriers to information sharing; • creation of additional information services for different users categories. OAI-Protocol for Metadata Harvesting BASIS SUPERSTRUCTURE OAI-PMH 2 types of requests: 1. SELECT ALL RECORDS; 2. SELECT RECORDS WHERE <criteria> 6 commands: GetRecord, Identify, ListIdentifier, ListMetadataFormats, ListRecords, ListSets HTTP Information Model OAI-PHM RESOURCE ↔ ELEMENT {ID_RECORD; RECORDS} RESOURCE IDENTIFIER Dublin Core METADATA SETS MARC21 MARCXML RECORDS User Metadata Set ... OAI Repositories over the World Archives USA UK Germ. Japan Spain Brazil India China France Canada Ukraine Australia 693 231 199 156 156 136 102 90 87 81 73 75 Number of Repositories — 4053 Number of Records ~ 39,000,000 Italy Taiwan Russia Portugal Colombia Sweden S.Africa Malaysia Nether Belgium Greece Archives 77 69 53 48 47 45 40 36 35 28 21 according to the Registry of Open Access Repositories ROAR – http://roar.eprints.org Open Access Statistics Repository type Software to create and manage OARs Software DSpace EPrints Bepress OPUS Invenio Greenstone Fedora Number of repositories 1579 567 366 72 19 22 57 OAR Example 1 OAR Example 2 JINR Document Server ̶ http://jds.jinr.ru/ Research Information Data/Metadata or Information about: • • • • • • • • • Scientists Project Managers Ongoing and Completed Projects Research Departments Funding Organisations and Programmes Research Results Publications Equipment their timely Relationships (Semantics) Who needs Research Information? What is a CRIS? Current Research Information System = CRIS … that means • Timeliness • Vitality … driven by … information about • People + • Organisations + • Projects + • Funding Programmes + • Research Results + •… • A Concept • A Model … incorporated as a • Implementation (ICT) An integrated approach towards managing research information CERIF Model Common European Research Information Format Instance Diagram HR System webpages OrgUnit M member Part of employee Person A OrgUnit O member Project leader Project P Project Finance Management webpages OrgUnit N Part of author owns IPR Publication X Repository CERIF Features (1) data model (data-centric) (2) allows for a (metadata) representation of –research entities –their activities / interconnections (research) –their output (results) (3) allows for high flexibility with formal (semantic) relationships (4) enables quality maintenance, archiving, access and interchange of research information (5) supports knowledge transfer to decision makers, for research evaluation, research managers, strategists, researchers, editors, the general public CRIS Example 1 CRIS Example 2 ИСТИНА (https://istina.msu.ru/) CRIS Example 3 Personal INformation System JINR PIN CRIS&OAR Challenge Collaboration of researchers, administration and librarians CRIS and OARs should join forces to deliver the best possible services Operational Layer Strategic Layer Current Research Information Systems (CRIS) & Open Access Repositories (OAR) Commonalities: Managment: Financial information Staff information R&D organisation administrative comprehensive integrative person-centric analytics public file-centric rights preservation distributed paradigm CRIS Bibliographic Information Affiliation Project Information OAR Managment: Bibliographic Data Full-Text Documents Authoritative Data Resources Aggregative Approach – Integrating with institutional HRM, project a.o. systems: Sharing and re-using resources Record the R&D (Research and Development) activity Cover projects, people (expertise), organizational structure, R&D outputs, events, facilities and equipment Collect and preservate the R&D outputs Services Set for the collaboration members to manage and distribute digital resources. Need Curation Processes & Human Responsibilities People staff manager P Projects P research project manager Materials U & Equipment facility manager B Bibliographic Information bibliography specialist, librarian, content manager, identity manager Curation View F Finance financial officer Normalize as much as possible: Authority Records* *search elements of bibliographic records + More qualitative, consistent data + Minimizing the data input by end-users Authority Control identify objects and concepts uniquely Authorities Variety People, Institutes, Grants, Experiments, Projects, Journals, … Identifiers Variety DOI, ORCID, ... Linkages Variety n:m relations, Vertical linkages, Horizontal linkages History Tracking Predecessors/Successors Authority Control Result Tool Source Data CRIS & OAR Systems Bibliographic Databases Vocabularies, Ontologies, ORCID/AuthorClaim a.o. authors‘ identifying systems Authority Control 1. Accounting of all name variants 2. Authoritative data disambiguation in information search, submission Relevant Information about R&D activity Lists of Publications Scientific Reporting Bibliometrics & Scientometrics JINR CRIS & OAR Systems JDS JINR Document Server Open Access Repository of materials concerning the R&D activity PIN IDC Personal Information System Integrated Digital Conferencing System Staff information: Employment profiles Bibliographic Archive Projects’ Information Scientific activities management: entire lifecycle for conferences, meetings, lectures Invenio, ©CERN ©JINR Indico, ©CERN from file from person from event Viewpoint Jinr Document Server (JDS) JDS has created and developed as an institutional repository with following content: 1. The research and scientific-related documents: – Publications issued in coauthorship with JINR researchers; – Archive documents that describe all the essential stages of the JINR research activity; 2. Documents providing informational support for scientific and technological research performed in JINR. JDS: Information Services • • • • • • • Search and navigation, Creation of the user’s groups, Saving search results, Individual and group bookshelves, Manuscripts deposition, Discussions on the publications, Sending out alerts and messages. Invenio SOFT • Unix-like OS - GNU/Linux distributions Debian, Gentoo, Scientific Linux (RHELbased), Ubuntu • HTML,CSS,JS • Python 2.7.5+ • MySQL • Redis Architecture http://jds.jinr.ru Trees Collections Subcollections Collection Books Information Card of Resource Attachment to Collection Authority Control Realization Solved by: MARC21 Authorities + Invenio v1.2.1 API MARC21 authorities Repeatable linking fields (fields 4xx, 5xx) Horizontal linking (subfield $w: $wa - predecessor, $wb- successor) Vertical linking (subfield $w: $wt - parent) Repeatable System Control Number (field 035) Repeatable Standard Technical Report Number (field 027) Module BibAuthority Enriching of bibliographic data with data from authority records Re-indexing of bibliographic records containing links to recently updated authority records Cross-referencing between MARC records($0 subfields) Collection Authorities http://jds-test3.jinr.ru Collection Institutes. Record JINR Record LIT. Detailed Information Institute →Publication Collection People. Author → Publication Detailed Information about Author Code Collection - MARC tag 980 defines which documents belong to the given collection Experiment → Publication Grant → Author → Publication Thesaurus Repository — place for storage and support any data. Archive — collection of the information resources + classification system (catalog). Knowledge — a existence and systematization form of the results of human cognitive activity. Knowledge (the subject) — the confident understanding of a subject, the ability to deal with it, to understand it and use to achieve some goals. Missing knowledge — knowledge known for humanity, but unknown to some person at the current moment (for example, the student and new subject of the educational program). Knowledge in the wide meaning — a subjective image of reality in the form of concepts and ideas. Knowledge in the narrow meaning — the possession of verified information (answers to questions), that allows to solve the challenge. Knowledge in the theory of artificial intelligence (AI) and expert systems — an information and inference rules about the world, objects properties, patterns of processes and phenomena, as well as the rules for the usage of them for decision-making. New knowledge — an information about the existence of any objects or their properties, of the real processes and phenomena, unknown for science previously, and not included in the current existing system of human representations about the world. Open Access (OA) to Research — way of the scientific communication by realization of the author right of the product on publication in such a manner that any person can get access to product from any place and at any time at an own choice. Open Archives Initiative (OAI) — an organization to develop and apply technical interoperability standards for archives to share catalog information (metadata). Self-archiving — a deposition the digital documents (metadata + full-text) in a OAI-compliant Archive. “Proxy” self-archiving — a deposition on behalf of any authors who feel that they are personally unable (too busy or technically incapable) to self-archive for themselves. Harvesting — automatic metadata gathering between repositories. OAI-PHM — Open Archives Initiative Protocol for Metadata Harvesting. Metadata — structured data which describes the characteristics of a resource (“An Introduction to Metadata”, by Chris Taylor, University of Queensland) Data about Data Book: Title: Pushkin's Fairy Tales Date of Publication: 2012 Author: Alexander Pushkin Editor: Williams Paul Translator: Elton Oliver, Krup Jacob Publisher: Bright City Structure: • Type of Resource • Title • Description • Source • Date • Author • Creator •… MARC21 — international standard for bibliographic data. A MARC bibliographic record consists of three main components: the Leader, the Directory, and the variable fields (http://www.loc.gov/marc/bibliographic/). 00X: Control Fields 01X-09X: Numbers and Code Fields 1XX: Main Entry Fields 20X-24X: Title and Title-Related Fields 25X-28X: Edition, Imprint, Etc. Fields 3XX: Physical Description, Etc. Fields 4XX: Series Statement Fields 5XX: Note Fields 6XX: Subject Access Fields 70X-75X: Added Entry Fields 76X-78X: Linking Entry Fields 80X-83X: Series Added Entry Fields 841-88X: Holdings, Location, Alternate Graphics, Etc. Fields MAchine-Readable Cataloguing 035 - System Control Number (Repeatable) 100 - Personal Name (Not Repeatable) 245 - Title Statement (Not Repeatable) SubFields Fields 700 – Add Entry Personal Name (Not Repeatable) SubFields Values XML — EXtensible Markup Language, metalanguage (language for description of other languages), universal format for structured documents and data (derived from SGML - Standard Generalized Markup Language) http://www.w3.org/XML/ Example: Root Element <?xml version="1.0" encoding="utf-8"?> ]<->Prolog <PRODUCTS> <PRODUCT> <TITLE> <PRICE> </PRODUCT> <PRODUCT> <TITLE> <PRICE> </PRODUCT> </PRODUCTS> Opening Tag Element Content Product #1 </TITLE> 10.00 </PRICE> Product #2 </TITLE> 20.00 </PRICE> Closing Tag MARCXML — a framework for working with MARC data in a XML environment (http://www.loc.gov/standards/marcxml/) Tag datafield = MARC field Tag subfield = MARC subfield Element Content = MARC subfield values Institutional Repository Open Access Idea Digital Libraries Tools Scientific and Educational Activity Institutional Repositories in the form of Open Access I. Digital Collection. Collection and preservation of intellectual output of organization. II. Set of services for the collaboration members in order to manage and distribute digital resources. CERIF — Common European Research Information Format 1) CERIF is an EU Recommendation to Member States (http://cordis.europa.eu/cerif/ ) 2) The European Commission (EC) has authorised euroCRIS to maintain and develop CERIF and its usage (http://www.eurocris.org/cerif/cerif-releases/ )