CERIF/euroCRIS and Elsevier
Where do they meet?
Prague, November 9, 2010
M’hamed el Aisati, Head of Product Technology, S&T Elsevier
Outline

What’s Elsevier (S&T) from a data and technology
perspective?





Data types
Data processing
Data Technology adopted and deployed at Elsevier
From Elsevier data models to CERIF
Is there a role for a publisher?


What role
More opportunities
2
Elsevier S&T = Scientific Data + Technology + much more
> 43 M Abstracts
(A&I)
> 10 M Full-text Articles
> 55K Main
organization profiles
> 20 M Author
profiles
> 60 M Patents
> 1 M Awarded grants
> 10 years of
ScienceDirect and
> 5 years of Scopus
usage/analytics data
> 10 K Books
> 500 M Quality
scientific web pages
3
Scopus coverage

A rich and
ex-tended
coverage
including



Abstracts and citations from
5000 publishers (ELS 15%)
3,6 Million conference papers
(10% of Scopus records)
“Articles in Press” from more than
3000 titles
23 Million Patents





1,200 Open Access journals
80% of all Scopus records have an abstract
Abstracts going back to 1823 (Scopus
includes all historical material of ELS,
Springer, ACS, AIP, Nature, Science, etc..)
Nearly 2,700 Arts & Humanities titles
430 m integrated scientific websites via
Scirus.com
~16,500
Nearly 18,000 Titles including
 16,500 Peer Reviewed Titles
 600 Trade Journals
 350 Book Series
 Extensive Conference Proceedings
 40 languages are covered
600
350
7700
5440
1460
250
350
230
Scopus info on www.info.scopus.com
4
Scientific Data + Technology provides extra value
5
“Your companion for a scientific life”
Department head
Librarian
Researcher
Funding agent
Manager/Admin
Dean/Provost
7
Performance Evaluation
BadenWurttemberg
8
Australian Research Council – ERA 2010




More info on:
http://www.arc.gov.au/era/default.htm


Assessment of research quality within
Australia's higher education
institutions using a combination of
indicators and expert review by
committees comprising experienced,
internationally-recognized experts.
ERA uses leading researchers to
evaluate research in eight discipline
clusters.
ERA will detail areas within institutions
and disciplines that are internationally
competitive, as well as point to
emerging areas where there are
opportunities for development and
further investment.
Early January 2010 – Aug/Sep 2010
First trial (PCE) in 2009
Scopus selected as source information
provider and partner
9
Australian Research Council – ERA
3 main components:
2010

EID tagging
- Dedicated web service (API)
- Reports:
-
»
Citation Benchmark report (cpp)
»
Centile threshold report
»
Ranked journal ‘Indicative World
Distribution’ Benchmark Report
ARC – Scopus – Universities interaction
EID tagging
process
Dedicated
Web Service
Outline

What’s Elsevier (S&T) from a data and technology
perspective?





Data types
Data processing
Data Technology adopted and deployed at Elsevier
From Elsevier data models to CERIF
Is there a role for a publisher?


What role
More opportunities
12
Database technologies at Elsevier (1)

XML native database for large bulk of data, e.g. Full-text articles, Abstract and
Indexing records
No ETL process involved
“Search Interface” as top layer for retrieving data – XQueries instead of
SQL queries
No (upfront) data modelling is required
Leveraging and retaining original XML structure
Multiple DTDs and schemas supported concurrently.
DTD or Schema not as a perquisite for data loading
With XQuery whole web applications can be built, i.e. no integration with
additional web programming language (e.g. php, javascript, etc.)
Though an expensive technology
Straightforward huge amount of data loading and querying might be
challenging
Requires specific skills
13
Database technologies at Elsevier (2)

RDMBs databases for lightweight information, e.g. article and journal metadata.
Known and established technology (e.g. SQL)
Typically heavy lifting is done at ETL stage in order to boost query
performance
Plenty of open source choice and thus free (e.g. MySQL), low threshold
for adoption
Ideal for small amount of information
ETL process can be lengthy
XML structure is ‘lost’ once data loaded. Separate DTD or schema
required for exporting data
SQL is typically a back-end technology. Front end (web) application
programming requires a different language (e.g. php, jsp, asp)
Data modelling is required. Updating the data model usually requires data
re-loading
14
Outline

What’s Elsevier (S&T) from a data and technology
perspective?





Data types
Data processing
Data Technology adopted and deployed at Elsevier
From Elsevier data models to CERIF
Is there a role for a publisher?


What role
More opportunities
15
Elsevier logical fit
16
Some data models at Elsevier




Authors are disambiguated
and profiled. Unique and
persistent identifier
Affiliations are
disambiguated and profiled.
Unique and persistent
identifier
Backward and forward
citations captured through
reference linking
Funding data aggregated to
affiliations
17
Simple relational data model example

Covers publications,
journals, classifications
(disciplines), authors,
affiliation, journal metrics,
citations, etc.
18
Affiliation Profile XML snippert
<xocs:doc content-type="Profile" dbname="scopusbase" xsi:schemaLocation="http://
www.elsevier.com/xml/xocs/dtd xocs-ip502.xsd" xmlns:xocs="http://www.elsevier.co
m/xml/xocs/dtd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><xocs:meta
><xocs:eid>10-s2.0-101718729</xocs:eid><xocs:timestamp>2009-01-09T13:04:06.06773
5-05:00</xocs:timestamp></xocs:meta><xocs:institution-profile><institution-profi
le affiliation-id="101718729">
<status>update</status>
Unique and
<date-created year="2008" month="02" day="03"/>
persistent
<date-revised year="2008" month="05" day="14" timestamp="2008-05
-14T00:05:34.000034+01:00"/>
affiliation ID
<date-revised year="2008" month="06" day="30" timestamp="2008-06
-30T02:09:24.000024+01:00"/>
<date-revised year="2009" month="01" day="01" timestamp="2009-01
-01T13:57:41.000041+00:00"/>
<date-revised year="2009" month="01" day="09" timestamp="2009-01
-09T17:44:11.000011+00:00"/>
<preferred-name>Balearic Islands Government</preferred-name>
<sort-name>Balearic Islands Government</sort-name>
<name-variant>Balearic Islands Government</name-variant>
<name-variant>Govern Balear</name-variant>
<name-variant>Govern de les Illes Balears</name-variant>
<address country="es">
<address-part>C/. Foners 10</address-part>
<city>Palma</city>
<postal-code>07006</postal-code>
</address>
……
19
Author Profile XML snippert
<author-profile id=“7401581436" type="author" suppress="false">
…..
Unique and
<preferred-name>
<initials>A.W.</initials>
persistent
<indexed-name>MacDonald A.</indexed-name>
author ID
<surname>MacDonald</surname>
<given-name>Alistair W.</given-name><
/preferred-name>
<name-variant>
<initials>A.W.</initials>
<indexed-name>Macdonald A.</indexed-name>
<surname>MacDonald</surname> <given-name>A. W.</given-name>
</name-variant>
…
<classificationgroup>
<classifications type="ASJC">
<classification frequency="7">1306</classification>
<classification frequency="1">1315</classification>
<publication-range start="1989" end="2009"/>
…
<journal-history type="author">
<journal type="j">
<sourcetitle>Clinical Cancer Research</sourcetitle>
…..
<affiliation-current>
<affiliation affiliation-id="106499546" parent="60019718"/>
</affiliation-current>
Reference
<affiliation-history>
to affiliation
<affiliation affiliation-id="104228751" parent="60024340"/>
</affiliation-history>
</author-profile>
20
Publication XML snippert
<bibrecord><item-info>
<copyright type="Elsevier">Copyright 2008 Elsevier B.V.,All rights reserved.</copyright>
<itemidlist><itemid idtype="SCP">34147094726</itemid>
Unique and
<history><date-created year="2007" month="04" day="18"/></history>
<dbcollection>SNCABS</dbcollection><
persistent
dbcollection>Scopusbase</dbcollection></item-info>
publication
<head><citation-info><citation-type code="ar"/>
<citation-language xml:lang="en"/>
ID
…..
<author seq="3" auid="7003372933">
<ce:initials>P.</ce:initials><ce:indexed-name>Barret P.</ce:indexed-name>
<ce:surname>Barret</ce:surname><ce:given-name>Pierre</ce:given-name>
Reference
<preferred-name><ce:initials>P.</ce:initials>
to author
<ce:indexed-name>Barret P.</ce:indexed-name>
<ce:surname>Barret</ce:surname><ce:given-name>Pierre</ce:given-name>
</preferred-name>
<ce:e-address type="email">XXX@YYYY.inra.fr</ce:e-address></author>
<affiliation country="fr" afid="60001542">
<organization>Plateforme de Transg??n??se du Bl??</organization>
<organization>UMR ASP 1095 INRA</organization>
Reference
<organization>Université Blaise Pascal</organization>
<city-group>63100 Clermont-Ferrand</city-group>
to affiliation
</affiliation>
<references count=“27”>
…..
</references>
</bibrecord>
21
Scopus Custom Data
Example of XML data
- <author-group>
- <author seq="1" auid="7005613516">
<ce:initials>A.</ce:initials>
<ce:indexed-name>Rothschild A.</ce:indexed-name>
<ce:surname>Rothschild</ce:surname>
<ce:given-name>Avner</ce:given-name>
- <preferred-name>
<ce:initials>A.</ce:initials>
<ce:indexed-name>Rothschild A.</ce:indexed-name>
<ce:surname>Rothschild</ce:surname>
<ce:given-name>Avner</ce:given-name>
</preferred-name>
<ce:e-address type="email">avner@mit.edu</ce:e-address>
</author>
- <author seq="2" auid="8625399100">
<ce:initials>S.J.</ce:initials>
Custom Data is:
• A big bucket of highly structured XML items
• Extracted directly from Scopus
• Accompanied by the articles’ cited by counts
• Supported by extensive documentation and test data
upon request
• FTP-ed or shipped via mobile (usb) drives
• Scopus contains ~42 million items
• In principle all articles can be ordered
• Custom Data can be grouped using the
following criteria:
• On ASJC code (All Science Journal
Classification Code). (see next slide)
• Per Country
• List of countries
• Per year
• Range of years
• Further refining possible in close
cooperation with Product Team
• Certain fields can be taken out if preferred;
• Abstracts
• References
• Etc.
22
A wide variety of Web Services and APIs





SOAP and REST: Simple and
accessible to low level
development
Different service levels
supported
Access to different content
types
 Hub
 ScienceDirect articles
 Scopus abstracts, Author
profiles, Affiliation profiles
Both Search and retrieval
XML and other formats
supported
23
Outline

What’s Elsevier (S&T) from a data and technology
perspective?





Data types
Data processing
Data Technology adopted and deployed at Elsevier
From Elsevier data models to CERIF
Is there a role for a publisher?


What role
More opportunities
24
Significant part of Research Information is at publisher


Publishers have lots of info about publications and
researchers
Publishers have been dealing with research info for
many years



Early adopters of XML and database technologies
Are at the front of changes taking place on research area
More and more publishers – certainly Elsevier - are
working closely with institutions on topics related to
research information management and performance
evaluation
25
euroCRIS and CERIF as seen by Elsevier



CERIF as a standardized format is a great initiative
Elsevier is happy to partner with euroCRIS to improve,
maintain and update the ‘standard’
Elsevier at the other hand is “agnostic” to CRIS
implementations



What is the future of data models moving forward with
evolving technologies?
Do you need one today? Do you care about how systems are
implemented and set up?
Shouldn’t the focus be on the interface/exchange layer?
With web services according to a standard (CERIF),
back-end systems are less relevant
26
Opportunities for euroCRIS and Elsevier

Work collaboratively on further standardization of CERIF






Ensure completeness of research information exchanged
through CERIF
Adopt CERIF as one of the exporting formats straight into
local systems (CRIS or non CRIS)
Elsevier and euroCRIS to help accelerate research
community management the population of local systems and
repositories
Expand CERIF to include metric based report information for
performance evaluation
Exchange technology and knowledge for potential CRIS
implementation recommendation
Accelerate integration of Elsevier and other vendors’
products and its data with local systems (e.g. HR, etc.)
27
Thanks
For questions and/or follow up:
M’hamed el Aisati
m.aisati@elsevier.com
28