- Eurolib

advertisement
News from the Publications Office
Norbert Hohn
Publications Office
Eurolib Plenary Meeting, Lisbon, 19-20 May 2011
News about…
•Virtua OPac
•EUBookshop
•Cellar
•Eurovoc
•Metadata Registry (MDR)
News from the cataloguing service
virtua : an off-the-shelf cataloguing tool for OP
 Agenda
• Project background
• Going into production
• Challenges and benefits
• What next?
virtua : project background
 What were we looking for?
 An off-the-shelf ILMS – cataloguing module and OPAC module
• To enable cataloguing to be done in-house
 A web OPAC
• To allow external users to search and download OP
bibliographical records (replacement for LIBCO)
virtua: project background
25/02/2009 Launch of Call for Tender AO 10021 for an integrated
library management system
13/07/2009 Award of contract to VTLS Europe, S.L. (virtua)
01/09/2009 Kick-off meeting
27/04/2010 Initial projected start date
14/12/2010 Start of production
virtua: project background
 What caused the delays?
 Requirement to communicate with several proprietary
systems
 Complex migration scenario
 Two additional projects required in order to go live:
• Codes
• Punctuation
virtua: codes project
Example notice from virtua with codes
Codes and their translations
041 0
$a eng $1 EN
• M11 = Theme (Social Sciences Research)
044
$c eu
• {LUXB} = Luxembourg
084
$a M11 $2 LU-LuOPE
• OPL = Publications Office
245 1 0 $a CEMP, the creation of European
management practice : $b final report.
• NPAG = p.
• NFIG = ill.
260
$a {LUXB} : $b OPL, $c 2004.
300
BR.
$a III, 127 NPAG : $b NFIG, NTAB ; $c A4 $d
440
$a EUR_SER_C ; $v 20968, $x 1018-5593
• UA_BIB = Bibl.
504
$a UA_BIB : NPAG 90-97.
540
$a REPRO1.
• REPRO1 = Reproduction is authorised
provided the source is acknowledged
650
7 $a 003656. $2 EUROVOC
…
710 2
$a CEU. $b RTD.
773 1 8 $t EUR_SER_C $q 2004, NPER 20968
…
• NTAB = tab.
• BR = softcover
• 003656 = Community research policy
• CEU = European Commission
• RTD = Directorate-General for Research
• EUR_SER_C = EUR. EU socio-economic
research
• NPER = No
910
$a GR
• GR = Free
920
$a 702
• 702 = Specialised
virtua: punctuation project
Example notice from virtua with automatically added punctuation
…
245 1 0 $a CEMP, the creation of European management practice : $b
final report.
260
$a {LUXB} : $b OPL, $c 2004.
300
$a III, 127 NPAG : $b NFIG, NTAB ; $c A4 $d BR.
440
$a EUR_SER_C ; $v 20968, $x 1018-5593
499
$a Project SOE1-CT97-1072
504
$a UA_BIB : NPAG 90-97.
…
540
650
$a REPRO1.
7 $a 003656. $2 EUROVOC
…
700 1
$a Engwall, Lars, $e ED.
710 2
$a CEU. $b RTD.
773 1 8 $t EUR_SER_C $q 2004, NPER 20968
…
virtua: going into production
 Migration of >200 000 records from PROCATX (OP’s database for
legal and general publications metadata) to virtua
 Re-import of these records from virtua to PROCATX
(synchronisation of both systems)
 Parallel running with external cataloguing contractor until
18.01.2011
virtua: challenges and benefits
 Challenges:
 Learning a new system
 Creating bibliographical records (as opposed to controlling
them)
 Adapting our workflows
 Indexation of records using EUROVOC
 Benefits:
 Reduced time delays for cataloguing publications (3 days
reduced to 24 hours)
 Automated validation checks to ensure quality and consistency
 Autonomy, enabling rapid intervention in records when
requested
 And not least, increased team spirit
virtua: what next?
 Opening of OPac to current LIBCO users – March 2011
 Possibility for users to export notices in MARC, CSV and
Endnote
 Deep-linking to EUbookshop of all records held in Virtua
(MARC21 field 856)
 Production of prepublication records
 Automatic activation of DOIs via an export from virtua
 Reduction of delays from moment publication is on EUB to
activation of DOI
OPac - the OP online public access catalogue
 Out of the box OPAC of Virtua (Chamo)
 http://opac.publications.europa.eu/
 Interface does not require a specific login etc. but we don’t publicise it and
give the address only to 'approved' users.
 Lets users discover materials quickly, using familiar search methods such as
Quick Search and faceted result links.
 Refining a search is as easy as picking a facet from a list or typing additional
terms in the search box and letting OPac add them to the original search
string.
 Advanced search give users the advantage of applying multiple filters
simultaneously.
 Users are able to export references to EU general publications in the format
more specifically designed for the library world (e.g. MARC21) as well as in
EndNote or CSV format.
OPac - the OP online public access catalogue
The tabs of the menu bar
Login
For administrators only.
Heading
To make searches by author, subject, title, and PUB_ID/workflow (catalogue number).
Cart
To store all records selected by user and to export them.
Clear session
Resets all searches done during the current session, cleans the cart and returns user to
first page.
Caveat: one peculiarity of the OPac service.
 OP uses a system of codes in records (e.g. 260 $a {LUXB} :) in order to
produce each record in the language of the publication catalogued.
 Although the facets display the translated values of these codes for the enduser, the MARC records themselves are displayed on the screen still coded.
However, when adding the records to the cart and then downloading them (by
selecting 'Export records to MARC'), the codes are automatically translated and
you will receive decoded notices in the resulting file (e.g. 260 $a Luxembourg :)
for import into your system.
Feedback
As this is a new service, we welcome any feedback from our users, including
ways in which we can improve it. If you need any further help or would like to
propose any improvements, please contact our team using the following
address:
opoce-procat-prod@publications.europa.eu
Deep-linking to EUbookshop of all records held in Virtua
(MARC21 field 856)
Purpose:
 Only records since February 2011 systematically have a deep-link to EUBookshop.
 By adding a deep-link to the bibliographical records for General Publications anyone
using these records will automatically be able to redirect their end-users to the
Publication Details page on EUB where the user can order or download the
publication they are looking for, even if the library/information centre displaying the
record does not hold a copy of the publication themselves.
Actions:
 Retrospectively adding a deep-link to field 856 in each bibliographical record to all
existing records (250.000). Multiple assignment possible, i.e. in addition to DOI (link
to resolver)
 Updating the import workflows into virtua so that all new records are given this deeplink by default.
Proposed date of putting into production: June 2011
Customers:
 Current LIBCO clients
 Pilot project with the British Library, which would result in some 50, 000 records being
made available in the UK through a syndication of libraries
Pre-publication records
 We foresee to create a preliminary record (prenotice) before
publication is finalised
 Might be interesting for Eurolib members to be alerted to new
publications
Automatic activation of DOIs via an export from virtua
 Automatic mapping from Marc21 to ONIX for DOI
 Reduction of delays from moment publication is on EUB to activation of
DOI
News from the EUBookshop
Metadata added to the publication detail page target audience and Eurovoc descriptors. These terms keywords are browsable.
New "discover" section - a menu through which users can access thematic collections of publications that cannot
be easily retrieved by site search or browsing. The compilation is often informed by frequently searched terms,
such as map or comics
A "just published" section - recently published titles
News from the CELLAR
Common Access to EU Information
Common access to EU information
Vision
To make available at a single place all metadata and
digital content managed by the Publications Office in a
harmonised and standardised way in order:
To guarantee to the citizen a better access to law and
publications of the European Union;
To encourage and facilitate reuse of content and
metadata by professionals and experts;
To preserve content and metadata and access to contents
and metadata over time.
Common access to EU information
24/7
Present: silos = independent solutions
Authors
CORDIS
CORDIS
TED
General Publications
Production
EUR-Lex
EU
Bookshop
Tendering Documents
Dissemination Specialized
portals
Official Publications
Citizens/Professionals
Common access to EU information
25/7
Future: harmonized architecture = common & shared solutions
Citizens/Professionals
Authors
CORDIS
General Publications
Production
Official Publications
Dissemination
Tendering Documents
Common portal
Specialized portals
Common access to EU information
Target architecture
Common access to EU information
CELLAR – Functional architecture 1/3
Publishing
Dissemination layer
Archive
Long term
preservation
Definition layer
Reference
Data layer
Reception, technical validation and storage
Postof content andValidation
metadata.
Production
Official
publications
Data flows:
Tendering
documents
General
publications
CORDIS
External
sources
(Court of
Justice…)
Production
Common access to EU information
CELLAR – Functional architecture 2/3
Publishing
Dissemination layer
Archive
Long term
preservation
Definition layer
Reference
Repository models (CCR
Dataand
layerCMR),
business rules (for uploading, archiving and
dissemination),
PostProduction
Validation
transformation rules, EuroVoc dissemination,
authority tables including translations.
Official
publications
Data flows:
Tendering
documents
General
publications
CORDIS
External
sources
(Court of
Justice…)
Production
Common access to EU information
CELLAR – Functional architecture 3/3
Publishing
Dissemination layer
Archive
Access
to and provision
of content
Long term
Definition
layer and metadata in the
preservation
Reference
requested format and/or presentation.
Data layer
PostProduction
Validation
Official
publications
Data flows:
Tendering
documents
General
publications
CORDIS
External
sources
(Court of
Justice…)
Production
Common access to EU information
CELLAR – Based on standards
OAIS
Reference
Archive
model
Publishing
Dissemination layer
Definition layer
Long term
preservation
FRBR
Reference
XML
Data layer
METS
METS
PostProduction
Validation
Official
publications
Data flows:
Tendering
documents
General
publications
CORDIS
External
sources
(Court of
Justice…)
Production
CELLAR – Web 3.0,
semantic technology
SPARQL
endpoint
Publishing
Dissemination layer
Archive
Long term
preservation
OWL
Definition layer
Data layer
SKOS
RDF
PostProduction
Validation
Official
publications
Data flows:
Tendering
documents
General
publications
Reference
CORDIS
External
sources
(Court of
Justice…)
Production
Digital archive of the EU
Content
 Complete collection of EU legal documents including






Treaties
Official Journal
Case-law
Preparatory acts
Consolidated acts
…
 General publications
 Research reports
 Merger taskforce decisions
Common access to EU information
CELLAR – A service enabler
On-line access
Provide on-line access through the Internet portals of the
Publications Office.
Automated access
Provide suitable interfaces for access by automated
agents.
External indexing
Enable indexing by Internet search engines.
Notification
Provide configurable notification services (RSS-feeds…).
Downloading
Support sporadic and regular downloading of resources
(subscription). Regular downloading should be
configurable.
Strategic formats
PDF, in particular PDF/A-1a and PDF/A-1b; XML; TIFF
Specific formats
Provide formats, which are not natively available in the
CELLAR (LegisWrite, ONIX notices…), i.e. transformation
services.
Deep linking
Enable external referencing of resources and guarantee
persistence of links over time.
Common access to EU information
CELLAR – ROADMAP
2010/2011
development (ongoing)
2011
data migration and upload (ongoing)
2012
online (planned)
News from Eurovoc
EuroVoc – Next releases 4.4
 Next release in Summer 2011 (EuroVoc 4.4)
 Update linked to the new “Lisbon Treaty”
• EC  EU
• European Community  European Union
 You can contribute via the website
 Permanent URI and ID for thesaurus Terms and concepts
 LOD (Linked Open Data)
 No deletion for concepts
 obsolete (use instead)
 deprecated (move as Non Preferred Term of a new concept)
EuroVoc – TAE Project - Purpose
 TAE = Thesaurus Alignment Environment
 Initiative of the Publications Office
 Mapping = matching
 Create semantic correspondences between concepts of two thesauri
 Objective: Map EuroVoc to
 ETT - European Vocational Training Thesaurus (Cedefop)
 GEMET - General Multilingual Environmental Thesaurus (European Environmental
Agency)
 Directory of European Legislation in force (EUR-Lex)
 EuroVoc 4.2
 Taxonomy EUB
ETT
EuroVoc – TAE Project - Approach
 Project participants
 Mondeca (Paris) – Alignement Tools
 Inria (Grenoble) – Matching algorithms
 Office des Publications – Reviewer - validator
 When?
 May 2010 - May 2011
 How? Using advanced semantic technologies
 An Interface enabling to:
• Review matching
• Import/export any vocabulary in SKOS (Simple Knowledge
Organization System)/RDF
• Import any matching algorithms
• Import/export any mapping results
EuroVoc – TAE Project – Examples for Automated alignements
 Types of correspondences generated by algorithms
 ExactMatch – concept T1 = concept T2
• T1 acid rain exact match T2 acid rain
T1=Gemet – T2=EuroVoc
 BroadMatch - concept T1 has a generic concept in T2
• T1 animal genetics broad match T2 genetics
 NarrowMatch - concept T1 has a specific concept in T2
• T1 mammal narrow match T2 wild mammal
EuroVoc – TAE Project – Practical use (overview)
 Indexing
 Detailed and enriched indexing
 Automatic indexing and re-indexing
 Double annotation
 Retrieving - Semantic extension
 Integration of results into search engines
 Facilitate users’ researches – „ Did you mean.. ? ”
 Redefinition of the research : Extend or Narrow the search results
 Results stored in CELLAR
 A unique storage and dissemination platform of the PO to access
European law and publications
 SKOS web services and Sparql-end point for accessing and querying
the mapping results
EuroVoc – TAE Project – Practical use: Help to indexing
 Annotation of a document by indexing of a specialized thesaurus
 « whaling » is not represented in EuroVoc
 but GEMET contains “whaling”
 Example in EUR-Lex
EuroVoc – TAE Project – Practical use: Help to indexing
 Correspondences (Gemet – EuroVoc) proposes in TAE
 Whaling exactMatch “whale” AND “hunting regulation”
 Compound Mapping
EuroVoc – TAE Project – Practical use: Help to information retrieval
 Search engines
 Did you mean … ?
 Automatic query expansion or restriction
 Search for Whale
 Did you mean … ?
Whaling
– Restrict the search results towards a more specific
concept in the target thesaurus
Whale or Marine mammal
– Expand the search results towards a more generic
concept in the source thesaurus
EuroVoc – Future actions
MetaThesaurus Working Group
 Main purpose
 Set up a specialized, multilingual thesauri network around EuroVoc
 Meeting foreseen in June 2011
 Advantages
 Use the same standards and formats
 Delegate the maintenance of specific domains
 Share candidates and translations
 Participants:
 EU Institutions, European agencies
 International institutions (FAO, Unesco)
 Other multilingual thesauri (EINIRAS)
 First approach made during the EuroVoc Conference (Luxembourg,
November 2010)
EuroVoc – Refresher of its benefits
 Enterprise Content Categorization
 Develop from the scratch
• Time consuming to build a taxonomy or controlled vocabulary
 Use “Starter” metadata to speed-up the development
• Import external metadata, taxonomies or controlled vocabulary
in your ECM system
• Avoiding duplicate efforts
• Minimize the cost of adding and managing metadata
 EuroVoc = a Building block of your ECM application
 A high-level controlled vocabulary
 Cost benefit : maintained by the Publications Office
 Offers different levels of specificity (TAE, thesauri
collaboration network)
EuroVoc within the OP Cellar
 In the repository will be stored:
 EuroVoc, the thesaurus
 The mapping or alignment results
 On the Cellar service layer EuroVoc will be implemented as web
services and Sparql-Endpoint for e.g.
 Linked Open Data
 Crosswalk EuroVoc and Semantic web applications
 Dereferencable URI
 Examples
• Search a term (expression or URI) and retrieves the alignments
• Search a term (expression or URI) and retrieves its relations
(Broader Term, Specific Term, Related Terms)
• Search a Microthesaurus and retrieves all the terms
EuroVoc – Licensing policy
 Free of charge (4-Years)
 Email: copyright-info@publications.europa.eu
 Information in the website under “legal notice”
 Login and Password to download the SKOS or XML
 Alert once a new release is available
 405 licences (64 for 2010, 64 for 2009)
 Types of licence
 Indexing
• Text mining and extraction, automatic indexing and categorization,
• Library Information System, Knowledge Management & ECM
 Translation (Albanian)
 Academic, project, research
• Semantic technologies
• Term matching
Eurovoc mappings
Contact
Ms Christine Laaboudi
Christine.Laaboudi@publications.europa.eu
News from the Metadata Registry
What is the Metadata Registry (MDR)?
 A central reference point for the registration and maintenance of
metadata definitions and related authority data used by
 The interinstitutional systems supporting the decision making
process
 The production and dissemination systems of the Publications
Office
 A framework for the harmonisation and standardisation of the
metadata used in this context
 Documentation
 Organisation
 Procedures
 Provide the reference metadata for reuse and validation purposes
to internal and external clients/client systems in human and
machine-readable format
Metadata Register – Scope
 Core metadata
 Limited set of metadata, which needs to be adopted by every
institution to enable interoperability, in particular in the
context of the decision making process
 Common part of the Metadata register
 Management on interinstitutional level (IMMC)
 Specific metadata
 Metadata dedicated to the specific internal needs of each
institution
 Out-of-scope for the common part of the Metadata register
 Private workspace inside the Metadata register could be
provided to facilitate management by the owner
Metadata Registry – Expected benefits
 Central reference location for metadata definitions and authority
data
 Reference source for consultation/validation purposes
 Stimulates reuse of metadata and increase interoperability
 Framework for harmonization and standardization
 Platform for collaboration and knowledge exchange in metadata
domain on interinstitutional level
Metadata Register - Architecture
 Back-end application
 Maintenance of metadata definitions and authority data
 Access limited to restricted number of expert users
 Based on same tool as used for Eurovoc back-end (ITM)
 Possibility to create individual workspaces
 Registration workflow (JIRA)
 Metadata Registry website (front-end)
 Browse MDR content (read access)
 Detailed information about registered items
 Possibility to submit proposal for registration/feedback (e.g. by
Eurolib members)
Metadata Register – Workflow overview
Metadata Registry - Organisation (proposal) 1/2
 Publications Office level
 Management of changes in MDR by Metadata Register Team
(MRT)
 Interinstitutional level
 Proposals for registration by Interinstitutional Metadata
Maintenance Committee (IMMC) (2 members per institution)
 Submission of relevant proposals by MRT to IMMC for approval
 Technical support/evaluation by MRT on request
 Management of changes in MDR by MRT
 Supervision by Interinstitutional Metadata Steering Committee
(IMSC) composed of the suppléants of the management board
of the Publications Office
Metadata Registry – Organisation 2/2
Common Authority Tables (CAT) – April 2011
Common Authority Tables
Source
Languages (ISO 639/1, 639/2B|T, 639/3)
ISO
Countries (ISO 3166/1-α2 and α3, 3166/3)
ISO
NTU (incl. NUTS and ISO 3166-2)
ISO + UNO + Eurostat
Currencies (ISO 4217)
ISO
Corporate Bodies
Various
Roles
LC + EurLex + Prelex
Places (locations, towns)
UN-LOCODE
Resource format (incl. dimensions)
ONIX + IANA
Resource type (categories of resources)
Internal sources
Target Audience
ONIX
Procedures
PreLex
Events
PreLex
Etc.
in progress
stable version
to be started
Metadata Registry - Roadmap
 Project kick-off: 20/12/2010
 Phase 1: Implementation of back-end application (management
of ontology, authority tables, export)
Target date: June 2011
 Phase 2: Implementation of front-end application
Target date: August 2011
MDR project contacts
 Metadata Registry team:
Holger BAGOLA
Corinne FRAPPART
Madeleine KISS
Martin SCHERBAUM
Willem VAN GEMERT
 Contact:
OP-METADATA-REGISTRY@publications.europa.eu
Thank you for your attention!
We appreciate your questions and
suggestions.
Download