Presentation

advertisement
Technical Workshop
On Line
November 31, 2004
London
Technical Workshop
London – On Line 2004
1
Agenda
10:00-10:45
System status & issues
10:45 - 11:15
New system features
11:15 - 11:30
XML Interfaces
11:30 - 12:15
New initiatives & schema developments
12:15 - 12:30
Questions ???
Technical Workshop
London – On Line 2004
2
System status
Java
100 Mb
full dup
Cisco PIX
Cisco PIX
Dell 2650
Dell 2650
Dell 2650
Database
Sun V440
1 Gb switch
4 1.28Ghz Sparc III
16GB Mem
Sun 3510 DAS
Fibre channel storage





Database hardware running at ~ 60% capacity
During peak loads Web server running >90% (single CPU)
Dell’s running RH Linux E 3.0
Sun Solaris 9
Migration to Oracle 9i 95% completed
Redundant IP handoff from MCI to complete by Dec 2004
Technical Workshop
London – On Line 2004
3
Query response times





Single query in a request
Great: 0.5s, Good: 1.5s, Slow: 3.5s, Bad:
6+
Five queries in a request
Great: 2.0s, Good: 5.5s, Slow: 10s, Bad:
15+
We’re investigating the relationship between query request
load, deposit processing load and query response time.
We’ll be adding SW load balancing to the Web front end
To help
 Limit the number of concurrent requests!
 Place more than one query in a request
 Use the batch upload Technical Workshop
London – On Line 2004
4
M
on
-0
M 4:H
on
-0 r-0
M 4:H 0
on
r
-0 -0
M 4:H 5
on
-0 r-1
M 4:H 0
on
-0 r-1
Tu 4:H 5
r
e0 -2
Tu 5:H 0
e- r-0
0
1
Tu 5:H
r
e0 -06
Tu 5:H
e- r-1
0
1
Tu 5:H
e- r-1
W 05: 6
e d Hr
-0 -2
W 6: 1
e d Hr
-0 -0
W 6: 2
e d Hr
-0 -0
W 6: 7
e d Hr
-0 -1
W 6: 2
e d Hr
-0 -17
Th 6 :H
u- r-2
0
2
Th 7:H
u- r-0
0
3
Th 7:H
r
u0 -08
Th 7:H
u- r-1
0
3
Th 7:H
u- r-1
07
8
Fr :Hr
i -0 -2
3
8
Fr :Hr
i -0 -0
4
8
Fr :Hr
i -0 -0
9
8
Fr :Hr
i -0 -1
8: 4
H
Sa r-19
t -0
9
Sa : 0
t -0 0
9
Sa : 0
t -0 5
9
Sa : 1
t -0 0
9
Sa : 15
Su t -0
n - 9: 2
1
Su 0 :H 0
n - r-0
1
1
Su 0 :H
n - r-0
1
6
Su 0 :H
n - r-1
1
1
Su 0 :H
n - r-1
10
6
:H
r-2
1
Weekly query load - hourly
40000
35000
30000
25000
20000
15000
10000
5000
0
Mon-Sun, Oct 4-10
Technical Workshop
London – On Line 2004
5
M
on
-1
M 1:H
on
-1 r-0
M 1:H 0
on
r
-1 -0
M 1:H 5
on
-1 r-1
M 1:H 0
on
-1 r-1
Tu 1:H 5
r
e1 -2
Tu 2:H 0
e- r-0
1
1
Tu 2:H
e- r-0
1
6
Tu 2:H
e- r-1
1
1
Tu 2:H
e- r-1
W 12: 6
e d Hr
-1 -2
W 3: 1
e d Hr
-1 -0
W 3: 2
e d Hr
-1 -0
W 3: 7
e d Hr
-1 -1
W 3: 2
e d Hr
-1 -17
Th 3 :H
u- r-2
1
2
Th 4:H
u- r-0
1
3
Th 4:H
u- r-0
1
8
Th 4:H
u- r-1
1
3
Th 4:H
u- r-1
14
8
Fr :Hr
i -1 -2
3
5
Fr :Hr
i -1 -0
4
5
Fr :Hr
i -1 -0
9
5
Fr :Hr
i -1 -1
4
5
Sa :H
r
t -1 -1
9
6
Sa : H
t -1 r-0
0
6
Sa : H
r
t -1 -0
5
6
Sa : H
t -1 r-1
0
6
Sa : Hr
t -1 -1
5
Su 6: H
nr2
1
Su 7 :H 0
n - r-0
1
1
Su 7 :H
n - r-0
1
6
Su 7 :H
n - r-1
1
1
Su 7 :H
n - r-1
17
6
:H
r-2
1
Weekly query load - hourly
Mon-Sun, Oct 11-17
40000
35000
30000
25000
20000
15000
10000
5000
0
Mon-Sun, Oct 11-17
Technical Workshop
London – On Line 2004
6
M
on
-1
M 8:H
on
-1 r-0
M 8:H 0
on
r
-1 -0
M 8:H 5
on
-1 r-1
M 8:H 0
on
-1 r-1
Tu 8:H 5
r
e1 -2
Tu 9:H 0
e- r-0
1
1
Tu 9:H
e- r-0
1
6
Tu 9:H
e- r-1
1
1
Tu 9:H
e- r-1
W 19: 6
e d Hr
-2 -2
W 0: 1
e d Hr
-2 -0
W 0: 2
e d Hr
-2 -0
W 0: 7
e d Hr
-2 -1
W 0: 2
e d Hr
-2 -17
Th 0 :H
u- r-2
2
2
Th 1:H
u- r-0
2
3
Th 1:H
u- r-0
2
8
Th 1:H
u- r-1
2
3
Th 1:H
u- r-1
21
8
Fr :Hr
i -2 -2
3
2
Fr :Hr
i -2 -0
4
2
Fr :Hr
i -2 -0
9
2
Fr :Hr
i -2 -1
4
2
Sa :H
r
t -2 -1
9
3
Sa : H
t -2 r-0
0
3
Sa : H
r
t -2 -0
5
3
Sa : H
t -2 r-1
0
3
Sa : Hr
t -2 -1
5
Su 3: H
nr2
2
Su 4 :H 0
n - r-0
2
1
Su 4 :H
n - r-0
2
6
Su 4 :H
n - r-1
2
1
Su 4 :H
n - r-1
24
6
:H
r-2
1
Weekly query load - hourly
Mon-Sun, Oct 18-24
30000
25000
20000
15000
10000
5000
0
Mon-Sun, Oct 18-24
Technical Workshop
London – On Line 2004
7
Batch processing times
Technical Workshop
London – On Line 2004
8
Conflicts
www.crossref.org =>Members Area => System Reports => Conflict Report
Technical Workshop
London – On Line 2004
9
Conflicts
===========================================
Created: 2004-10-21 04:38:03.0
Journal title
ConfID: 139239
CauseID: 110986773
Metadata used for both DOIs
OtherID: 76436491,
JT: Scottish Journal of Theology
MD: Marsh, 55 ,3,253,2002,In defense of a self: the theological …
DOI: 10.1017/S0336930602000313 (139239-null 139291-null )
DOI: 10.1017/S0036930602000315 (139239-null 139291-null )
===========================================
2 DOIs for the same article
DOIs are in a second conflict
The state of the conflict ID
null => unresolved
Technical Workshop
London – On Line 2004
10
Conflicts: What to do about them

Send us an email instructing how to resolve the conflict
 Make one DOI prime, all others into aliases

Primary DOI
DOI to be aliased to primary
Conflict IDs
10.1016/j.clindermatol.2003.11.001
10.1016/j.clindermatol.2003.12.026
10.1016/j.clindermatol.2003.12.031
10.1016/S0738-081X(03)00103-2
10.1016/S0738-081X(03)00150-0
10.1016/S0738-081X(03)00153-6
104155
104157
104159
Resolve the conflict without doing anything
Conflict ID
101115
103044
103048
105650

Resend in one of the DOIs with new (different) metadata
 (Soon) login to doi.crossref.org and resolve them yourself
Technical Workshop
London – On Line 2004
11
Conflicts: prevent them
<journal_article publication_type="full_text">
<titles><title>Phys. Rev. A</title></titles>
<contributors>
<person_name sequence="first" contributor_role="author">
<given_name>Petr O.</given_name>
<surname>Fedichev</surname>
</person_name>
<publication_date media_type="online">
<month>04</month>
<year>2004</year>
</publication_date>
<publisher_item>
<item_number item_number_type="sequence-number">
PhysRevA.69.049902
</item_number>
</publisher_item>
<doi_data>
<doi>10.1103/PhysRevA.69.049902</doi>
<timestamp>20040412120604</timestamp>
<resource>http://link.aps.org/doi/10.1103/PhysRevA.69.049902</resource>
</doi_data>
</journal_article>
Technical Workshop
London – On Line 2004
12
Issues

Data quality

Missing fields (publish ahead of print is OK, but
update when data is available)

First author being mixed up with other contributors

Journal titles

Full titles in query without ISSN may cause misses

Two recent fuzzy match changes have had an effect
1. Eliminated a dangerous rule the could return false
positives when title and ISSN did not match well
2. Lowered the threshold on matching long titles
Technical Workshop
London – On Line 2004
13
Issues

Depositing a new title
 If you send in 2 files at the same time with DOIs for a
new title it may result in two title entries in CrossRef

DOIs for journal titles and issues
 These can be created in the <journal_metadata> and
<issue_metadata> tags

Page numbers with alpha characters
 ‘S110’ or ‘110S’ is handled better than ’30-1’
 110 in a query will match S110 or 110S
 30 in a query will not match 30-1
10.1016/S0003-4975(02)04151-6


10.1029/2002GL014973
20-a should be Ok
’69F-a’ will only match an exact string
Technical Workshop
London – On Line 2004
14
Issues

Query results in XML format
servlet/query?usr=<username>&pwd=<password>&
type=<queryType>&format=<resultFormat>&qdata= ….

Result format can be: piped, xml, xsd_xml

xml is the old legacy XML format (no schema)

xsd_xml has a schema and includes all new features
http://www.crossref.org/qrschema/crossref_query_output2.0.xsd
Use of the legacy XML format should be discontinued
Technical Workshop
London – On Line 2004
15
Issues

Each batch ID / query key combination must be unique.
<head>
<doi_batch_id>SPI_2004-09-15_09-48-22</doi_batch_id>
<doi_data>
<doi>10.1007/BF00393374</doi>
<resource><![CDATA[ http://www.springerlink.com/index/10.1007/BF00393374 ]]>
</resource>
</doi_data>
<citation_list>
<citation key="CR1">
<journal_title>Appl Environ Microbiol</journal_title>
<author>RI Amann</author>
<volume>56</volume>
<citations_diagnostic>
<citation key="CR1" status="warning">
A stored query with doi_batch_id=SPI_2004-09-15_09-48-22
and query_key=CR1 already
exists for the same depositor
Technical Workshop
London – On Line 2004
</citation>
16
Issues

Upload timeouts
 Large (250K+) files may not be completing the upload
 No HTTP response is returned
 Only a few users seem to be effected (some can
upload 1M+ files)

Solutions (& work arounds)
 Break the files up (some are doing 1 DOI per file)
 CrossRef to investigate session time outs

Let me know if your having this problem
Technical Workshop
London – On Line 2004
17
DX contingency planning

CrossRef will be running a secondary Handle system and
a DOI proxy resolver.

The secondary Handle server receives updates from
the DOI primary (at CNRI) about 15 minutes after
DOIs are created/updated by CrossRef
1
2
deposit
3

The proxy will share the load going to http://dx.doi.org
(DNS subdomain will direct traffic to several IPs)
Technical Workshop
London – On Line 2004
18
New features

Unified query

Tracking ID

Open Channel Interface

Forward linking

Local hosting changes
Technical Workshop
London – On Line 2004
19
Unified query

Journals, conf. proceedings and books have different
metadata => queries must examine different fields

… and it gets worse
 Proceedings have event name and an event acronym
 Proceedings have event date and publication date
 Proceedings and Books have ISBNs and/or ISSNs

The real problem is: its hard to tell from a reference what
kind of item is being referenced
Technical Workshop
London – On Line 2004
20
Unified query

The solution is to have one query that examines
everything and returns the right result

Step 1: change the current ‘journal’ query to have the
‘title’ field also examine proceedings event name and
event acronym and the ‘issn’ field examine
proceedings ISSNs
0277786X|Proceedings of SPIE||4272||133|2001|||
0277786X||Proceedings of SPIE |Srinivasan|4272||133|2001
||full_text ||10.1117/12.430790
‘journal’ query: only one title
‘proceedings’ result: two title field (series is empty)
Technical Workshop
London – On Line 2004
21
Tracking IDs
http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&
pwd=<PWD>&doi_batch_id=NJ028011-b406513a&type=result
OR
http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&
pwd=<PWD>&file_name=b406513a_doi.xml&type=result
Returns the log file
http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&
pwd=<PWD>&file_name=b406513a_doi.xml&type=contents
Returns the XML deposit file
Technical Workshop
London – On Line 2004
22
Open Channel Interface

‘Premium’ fee has been dropped
 Available on a case-by-case basis
 Continuous connection to CrossRef for pipe’d queries
 Response time can by 10X better than HTTP queries
import java.net.*;
import java.io.*;
Socket
socket;
PrintWriter
out;
BufferedReader in;
socket = new Socket(host, port);
out
= new PrintWriter(socket.getOutputStream(), true);
in
= new BufferedReader(new InputStreamReader(socket.getInputStream()));
out.println(qData);
String line = in.readLine();
Technical Workshop
London – On Line 2004
23
Forward Linking

Forward linking deposits are simply the list of
references listed in the bibliography
 You most likely already send this data to CrossRef
in the form of queries (in fact reference deposits
look very much like queries)
 There are two ways to deposit references for an article
1. Send them in with the article’s metadata
2. Send them in separately after an article’s DOI
and metadata are deposited
Technical Workshop
London – On Line 2004
24
Technical Workshop
London – On Line 2004
25
Forward Linking – deposit log
<?xml version="1.0" encoding="UTF-8"?>
<doi_batch_diagnostic status="completed">
<submission_id>115276193</submission_id>
<batch_id>4219-com.wiley.cch.processes.JournalToDOI16047.xref</batch_id>
<record_diagnostic status="Success">
<doi>10.1002/(ISSN)1097-0134</doi>
<msg>Successfully updated in handle</msg>
</record_diagnostic>
<record_diagnostic status="Success">
<doi>10.1002/prot.20276</doi>
<msg>Successfully added</msg>
<citations_diagnostic>
<citation key="10.1002/prot.20276-BIB1" status="stored_query" />
<citation key="10.1002/prot.20276-BIB2"
status="resolved_reference">10.1006/jsbi.2001.4428</citation>
<citation key="10.1002/prot.20276-BIB3“
status="resolved_reference">10.1110/ps.0227803</citation>
<citation key="10.1002/prot.20276-BIB4“
status="resolved_reference">10.1006/jmbi.1990.9999</citation>
<citation key="10.1002/prot.20276-BIB5" status="stored_query" />
<citation key="10.1002/prot.20276-BIB6" status="stored_query" />
<citation key="10.1002/prot.20276-BIB7"
status="stored_query"
/>
Technical
Workshop
London – On Line 2004
26
Forward Linking – query
<?xml version = "1.0" encoding="UTF-8"?>
<query_batch version="2.0" xmlns =
"http://www.crossref.org/qschema/2.0">
<head>
<email_address>ckoscher@crossref.org</email_address>
<doi_batch_id>fl_001</doi_batch_id>
</head>
<body>
<fl_query alert='false'>
<doi>10.1110/ps.0227803</doi>
</fl_query>
</body>
</query_batch>
Note: only user ‘coldspring’ can run this query
Technical Workshop
London – On Line 2004
27
Technical Workshop
London – On Line 2004
28
Multiple resolution

Multiple resolution presents choices to the user from the
site of the link
 An
XML CrossRef deposit sets up the menu and
multiple links (sample)
Technical Workshop
London – On Line 2004
29
Multiple resolution - deposit

Normal link is built with the <a> (anchor) tag
<a href="http://dx.doi.org/10.5555/sample-doi">The Link Text</a>

Multiple resolution link is built with the <script> tag

One instance of <script> to load the menu library
<script
src="http://www.crossref.org/MRLoader/milonic_src.js">
</script>
The menu builder code

For each link
<script
src="http://www.crossref.org/MRLoader/MR/
10.5555/sample-doi?The%20Link%20Text">
</script>
The DOI
The link text
Technical Workshop
London – On Line 2004
30
Multiple resolution - deployment

Multiple resolution deployment requires three things:
1.
2.
3.

Registration of multiple targets for a given DOI
Operation of the MRLoader resolver
Construction of MR links on Web pages
Everyone has a part to play
1.
2.
3.
Publishers that ‘own’ the target DOI must implement
(or authorize a 3rd party) to register multiple targets
CrossRef and/or the content owner publisher must
operate the MRLoader resolver
Every Web page that links to the MR enabled DOI
must replace <a> tags with <script> tags
Technical Workshop
London – On Line 2004
31
Web Deposit Form



Allows users to enter the metadata for a deposit
using a Web form. No XML skills required
Supports journal articles, now working to add
conference proceedings and books. Later, will
add reference deposits and components
Must know your CrossRef member login
www.crossref.org =>Member Area =>
Member Resources =>
web deposit form
http://www.crossref.org/webDeposit
Technical Workshop
London – On Line 2004
32
Technical Workshop
London – On Line 2004
33
XML Queries
• XML Queries provide a more structured format and
enable features unavailable in pipe’d queries
1.
2.
3.
4.
Enable multiple hits
Control over which fields are fuzzy matched
Forward linking queries
Query match alerts
Technical Workshop
London – On Line 2004
34
Metadata query
<?xml version = "1.0" encoding="UTF-8"?>
<query_batch version="1.0" xmlns = "http://www.crossref.org/qschema/2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<head>
<email_address>name@address.com</email_address>
<doi_batch_id>SomeTrackingID2</doi_batch_id>
</head>
<body>
<query key="MyKey1" enable-multiple-hits="false“
forward-match=“false”>
<issn>10408746</issn>
<journal_title>Current Opinion in Oncology</journal_title>
<author>Chauncey</author>
<volume>13</volume>
<issue>1</issue>
•Order is important
<first_page>21</first_page>
<year>2001</year>
•Fields can be omitted
</query>
</body>
</query_batch>
Technical Workshop
London – On Line 2004
35
Fuzzy match control
• Fields with a “match” attribute can be controlled







ISSN: optional, exact
Journal/Volume: optional, fuzzy, exact (default="fuzzy“)
Series Title: optional, null, fuzzy, exact (default="fuzzy“)
Author: optional, fuzzy, null, exact (default="fuzzy“)
Volume: optional, fuzzy, exact (default="fuzzy")
Issue: optional, fuzzy,exact (default="fuzzy“)
Page: optional, null,exact (default="optional“)
Example:
<journal_title match=“exact”>Current Opinion in Oncology</journal_title>
Technical Workshop
London – On Line 2004
36
A word on special characters
• Metadata deposits are supposed to be UTF-8 Unicode
é = é (decimal) = é (hex)
Queries
10.5555/char_test_001
Issn:12345678
Title: Test Publication
Author: Joénes
Volume: 12
Issue: 1
Page: S125
Year: 1999
<journal_title>Test Publication</journal_title>
<author>Joenes</author>
<volume>12</volume>
<first_page>125</first_page>
<year>1999</year>
Works
because page
is supplied
<journal_title>Test Publication</journal_title>
<author>Joenes</author>
<volume>12</volume>
<year>1999</year>
Does
NOT
work
Arrggghh…
<journal_title>Test Publication</journal_title>
<author>Joénes</author>
<volume>12</volume>
<year>1999</year>
Technical Workshop
London – On Line 2004
Works because
correct author is
supplied
37
Stored Queries
• CrossRef remembers queries that do not initially
match and sends an email notice when the finally do.
<?xml version = "1.0" encoding="UTF-8"?>
<query_batch version="1.0" xmlns = "http://www.crossref.o…">
<head>
<email_address>ckoscher@crossref.org</email_address>
<doi_batch_id>fm_429_001</doi_batch_id>
</head>
<body>
<query key="fm_1" enable-multiple-hits="false“
forward-match="true">
<journal_title>Test Publication</journal_title>
<author>Anderson</author>
<volume>33</volume>
<issue>9</issue>
<first_page>125</first_page>
<year>2002</year>
</query>
</body>
</query_batch>
Technical Workshop
London – On Line 2004
38
Log message when query is submitted
<?xml version="1.0" encoding="UTF-8" ?>
<crossref_result version="2.0" xmlns="http://www.crossref.org/qrschema/2.0" …">
<query_result>
<head>
<email_address>ckoscher@crossref.org</email_address>
<doi_batch_id>fm_429_001</doi_batch_id>
</head>
<body>
<query status="unresolved">
<journal_title>Test Publication</journal_title>
<author>Anderson</author>
<volume>33</volume>
<issue>9</issue>
<first_page>125</first_page>
<year>2002</year>
<msg>Query stored in CrossRef for forward matching</msg>
</query>
</body>
</query_result>
</crossref_result>
Technical Workshop
London – On Line 2004
39
Results email
Subject: Crossref stored query match: doi_batch_id= fm_429_001 ; query_key= fm_1
<?xml version = "1.0" encoding = "UTF-8"?>
<crossref_result version="2.0" xmlns="http://www….-instance… ">
<query_result>
<head>
<email_address>ckoscher@crossref.org</email_address>
<doi_batch_id> fm_429_001 </doi_batch_id>
</head>
<body>
<query key=“fm_1" status="resolved">
<doi>10.5555/forward_match_test_2</doi>
<issn>12345678</issn>
<journal_title match="exact">Test Publication</journal_title>
<author match="exact">Smith</author>
<volume match="exact">3</volume>
<issue>2</issue>
<first_page match="exact">100</first_page>
<year match="exact">1985</year>
<publication_type>full_text</publication_type>
</query>
</body>
</query_result>
</crossref_result>
Technical Workshop
London – On Line 2004
40
Polling for Query Matches
• You can interrogate the system to get a list of
queries that may have matched.
http://doi.crossref.org/servlet/downloadStoredQueries?
usr=creftest&pwd=c53test&startDate=2004-0331&endDate=2004-05-03
Technical Workshop
London – On Line 2004
41
Forward Linking Queries
• Forward linking is an ‘opt-in’ service
• Fees: a surcharge on the annual membership
• Permission must be enabled by a CrossRef
administrator
Technical Workshop
London – On Line 2004
42
Forward Linking Query Example
Reference deposit #1 (log)
Reference deposit #2 (log)
<?xml version = "1.0" encoding="UTF-8"?>
<query_batch version="2.0" xmlns = "http://www.crossref.org/qschema/2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.crossref.org/qschema/2.0
http://www.crossref.org/qschema/crossref_query_input2.0.xsd">
<head>
<email_address>ckoscher@crossref.org</email_address>
<doi_batch_id>fl_001</doi_batch_id>
</head>
<body>
<fl_query alert="true">
<doi>10.1097/00001622-200101000-00005</doi>
</fl_query>
</body>
</query_batch>
Sample: forward linking query results
Technical Workshop
London – On Line 2004
43
Forward Linking Alerts
• Once you’ve made a forward link query, deposit of
any new articles that cite the DOI you requested will
generate an alert email
Reference deposit #3 (log)
Email alert notice
Technical Workshop
London – On Line 2004
44
New Initiatives

Components

Extended content types

Plans for 2005
Technical Workshop
London – On Line 2004
45
Component Deposits
What is a component ?
Components are considered to be sub-items
that are part of the construction of an article,
chapter or conference paper or provide
supporting (sometimes called supplemental)
information.
These items by and of themselves are not typically
cited in a bibliography, but they are cited within the
text
NOTE: Title DOIs and Issue DOIs are not components

They should be deposited in the journal, conf-proc or
Technical Workshop
book metadata
London – On Line 2004
46
Component Deposits
Why create DOIs for components ?
 To improve link management
 Build persistent links
 Use multiple resolution on them
How are components deposited
 Schema version 3.0.3 supports components
 Deposit as part of an article’s metadata or
standalone
(note: a parent DOI must be specified)
Technical Workshop
London – On Line 2004
47
Component Deposits
What Component services will CrossRef offer?
 Near term
 Just the registration of the DOI

Long term

Some form of lookup service (e.g. query)

Expanded component metadata
(licensing, copyright …?)
Technical Workshop
London – On Line 2004
48
Component Deposits
<journal_article>
...
<doi_data>
<doi>10.9876/S0003695199019014</doi>
<resource>http://ojps.aip.org:18000/link/?apl/74/1/76/ab</resource>
</doi_data>
<component parent_relation="isPartOf">
<description><b>Figure 1:</b> This is the caption of the first figure...</description>
<format mime_type="image/jpeg">Web resolution image</format>
<doi_data>
<doi>10.9876/S0003695199019014/f1</doi>
<resource>http://ojps.aip.org:18000/link/?apl/74/1/76/f1</resource>
</doi_data>
</component>
<component parent_relation="isReferencedBy">
<description><b>Video 1:</b> This is a description of the video...</description>
<format mime_type="video/mpeg"/>
<doi_data>
<doi>10.9876/S0003695199019014/video1</doi>
<resource>http://ojps.aip.org:18000/link/?apl/74/1/76/video1</resource>
</doi_data>
</component>
Technical Workshop
</journal_article>
London – On Line 2004
49
Component Deposits

Alternatively components may be deposited
separately from their ‘parent’ item’s metadata
<body>
<sa_component>
<doi>10.9876/molcell/10/4</doi>
<component parent_relation="isPartOf">
<description>Cover Image, Molecular Cell, Volume 10, Issue 4, January 2004
</description>
<format mime_type="image/tiff"/>
<doi_data>
<doi>10.9876/molcell/10/4/cover</doi>
<resource>http://molcell.org/10/4/cover</resource>
</doi_data>
</component>
</sa_component>
</body>
Technical Workshop
London – On Line 2004
50
Expanded Content Types
 Metadata

study now underway
Dissertations, technical reports, working
papers, standards, patents and databases
 Implementation
to occur in early 2005

XML schema will update to version 4.0

Deposits

Query services
Extend current query mechanism ?
 ‘Firewall’ current content ?

Technical Workshop
London – On Line 2004
51
Expanded Content Types
Dissertations
<dissertation>
<person_name>
<titles>
<acceptance_date>
<university>
<name>
<location>
<department>
<degree>
<publisher_item>
<doi_data>



Add <advisor> elements
Review NDLTD metadata
standards
Survey T & D organizations
 Cal Tech
 ProQuest
 Texas A&M
 ?
Technical Workshop
London – On Line 2004
52
Expanded Content Types
Reports
<report>
<contributors>
<titles>
<publisher.
<publication_date>
<publisher_item>
<series_metadata>
<isbn>
<issn>
<research_organization>
<sponsor>
<organization>
<contract>
<doi_data>



Drop ‘technical’ label
Support chapters?
Survey organizations
 AGU
 NASA/JPL
 Other government?
 ?
Technical Workshop
London – On Line 2004
53
Expanded Content Types
Working papers
<report>
<contributors>
<titles>
<publisher> or <university>
<publication_date>
<publisher_item>
<series_metadata>
<isbn>
<issn>
<research_organization>
<sponsor>
<organization>
<contract>
<doi_data>



Conflicts with published articles
Include series metadata?
Survey organizations
 ?
Technical Workshop
London – On Line 2004
54
Expanded Content Types
Standards




Not included in initial analysis, added after
annual member meeting
 Interest from IEEE
Metadata draft development TBD
Accredited Vs consortium standards
Survey organizations
 Niso, ANSI, BSI, ISO
 IEEE
 ConsortiumInfo.org
Technical Workshop
London – On Line 2004
55
Plans for 2005

Modify / improve page number processing

Normalized XML

Modularize CrossRef system

Implement Expanded Content Types

Others
Technical Workshop
London – On Line 2004
56
Page & article numbers

CrossRef deposit schema allows for first page and article
number
<pages><first_page>
<publisher_item><item_number item_number_type="article-number">
 Article

number will be used if no first page is provided
a query has only one ‘page’ field and will search either
first_page or article_number but not both

Some articles have both: both are presented to the reader
 change the query logic to search both fields
 add and XML query field for ‘article_number’

Page numbers (and article numbers?) are not numbers

would a full fuzzy match on page improve matching rates?
Technical Workshop
London – On Line 2004
57
Normalized XML

Journal, proceedings and book content is stored in 2 places in the
CrossRef database
1. Subset in tables/columns to support query operations
2. Entire deposit as a CLOB (not easily accessed)

XML query results are specialized for each content type
(<journal_cite><conf_cite><book_cite>)

Reduce all content type info to a simpler ‘one size fits all’ schema

Store each DOI record as XML in a database column (memo?)

Facilitates access to all metadata (e.g. complete ‘lite’ weight local
host files)

Yield a more consistent XML query result
Technical Workshop
London – On Line 2004
58
Modularize

Current system is a monolith
 One database supports everything

Separate operations to improve performance and scalability
 Deposits & updates
 Queries
 Reports
 Additional
benefits
 Local host the CrossRef query system, not just the
metadata
Technical Workshop
London – On Line 2004
59
Other

Components
 Query mechanism
 Expand the metadata (license, rights …)

Production implementation of multiple resolution
 Integrate into the deposit process
 Implement a local host type option
 Automatic appropriate copy service
Technical Workshop
London – On Line 2004
60
Questions / Discussion
Technical Workshop
London – On Line 2004
61
Download