Technical Workshop On Line November 31, 2004 London Technical Workshop London – On Line 2004 1 Agenda 10:00-10:45 System status & issues 10:45 - 11:15 New system features 11:15 - 11:30 XML Interfaces 11:30 - 12:15 New initiatives & schema developments 12:15 - 12:30 Questions ??? Technical Workshop London – On Line 2004 2 System status Java 100 Mb full dup Cisco PIX Cisco PIX Dell 2650 Dell 2650 Dell 2650 Database Sun V440 1 Gb switch 4 1.28Ghz Sparc III 16GB Mem Sun 3510 DAS Fibre channel storage Database hardware running at ~ 60% capacity During peak loads Web server running >90% (single CPU) Dell’s running RH Linux E 3.0 Sun Solaris 9 Migration to Oracle 9i 95% completed Redundant IP handoff from MCI to complete by Dec 2004 Technical Workshop London – On Line 2004 3 Query response times Single query in a request Great: 0.5s, Good: 1.5s, Slow: 3.5s, Bad: 6+ Five queries in a request Great: 2.0s, Good: 5.5s, Slow: 10s, Bad: 15+ We’re investigating the relationship between query request load, deposit processing load and query response time. We’ll be adding SW load balancing to the Web front end To help Limit the number of concurrent requests! Place more than one query in a request Use the batch upload Technical Workshop London – On Line 2004 4 M on -0 M 4:H on -0 r-0 M 4:H 0 on r -0 -0 M 4:H 5 on -0 r-1 M 4:H 0 on -0 r-1 Tu 4:H 5 r e0 -2 Tu 5:H 0 e- r-0 0 1 Tu 5:H r e0 -06 Tu 5:H e- r-1 0 1 Tu 5:H e- r-1 W 05: 6 e d Hr -0 -2 W 6: 1 e d Hr -0 -0 W 6: 2 e d Hr -0 -0 W 6: 7 e d Hr -0 -1 W 6: 2 e d Hr -0 -17 Th 6 :H u- r-2 0 2 Th 7:H u- r-0 0 3 Th 7:H r u0 -08 Th 7:H u- r-1 0 3 Th 7:H u- r-1 07 8 Fr :Hr i -0 -2 3 8 Fr :Hr i -0 -0 4 8 Fr :Hr i -0 -0 9 8 Fr :Hr i -0 -1 8: 4 H Sa r-19 t -0 9 Sa : 0 t -0 0 9 Sa : 0 t -0 5 9 Sa : 1 t -0 0 9 Sa : 15 Su t -0 n - 9: 2 1 Su 0 :H 0 n - r-0 1 1 Su 0 :H n - r-0 1 6 Su 0 :H n - r-1 1 1 Su 0 :H n - r-1 10 6 :H r-2 1 Weekly query load - hourly 40000 35000 30000 25000 20000 15000 10000 5000 0 Mon-Sun, Oct 4-10 Technical Workshop London – On Line 2004 5 M on -1 M 1:H on -1 r-0 M 1:H 0 on r -1 -0 M 1:H 5 on -1 r-1 M 1:H 0 on -1 r-1 Tu 1:H 5 r e1 -2 Tu 2:H 0 e- r-0 1 1 Tu 2:H e- r-0 1 6 Tu 2:H e- r-1 1 1 Tu 2:H e- r-1 W 12: 6 e d Hr -1 -2 W 3: 1 e d Hr -1 -0 W 3: 2 e d Hr -1 -0 W 3: 7 e d Hr -1 -1 W 3: 2 e d Hr -1 -17 Th 3 :H u- r-2 1 2 Th 4:H u- r-0 1 3 Th 4:H u- r-0 1 8 Th 4:H u- r-1 1 3 Th 4:H u- r-1 14 8 Fr :Hr i -1 -2 3 5 Fr :Hr i -1 -0 4 5 Fr :Hr i -1 -0 9 5 Fr :Hr i -1 -1 4 5 Sa :H r t -1 -1 9 6 Sa : H t -1 r-0 0 6 Sa : H r t -1 -0 5 6 Sa : H t -1 r-1 0 6 Sa : Hr t -1 -1 5 Su 6: H nr2 1 Su 7 :H 0 n - r-0 1 1 Su 7 :H n - r-0 1 6 Su 7 :H n - r-1 1 1 Su 7 :H n - r-1 17 6 :H r-2 1 Weekly query load - hourly Mon-Sun, Oct 11-17 40000 35000 30000 25000 20000 15000 10000 5000 0 Mon-Sun, Oct 11-17 Technical Workshop London – On Line 2004 6 M on -1 M 8:H on -1 r-0 M 8:H 0 on r -1 -0 M 8:H 5 on -1 r-1 M 8:H 0 on -1 r-1 Tu 8:H 5 r e1 -2 Tu 9:H 0 e- r-0 1 1 Tu 9:H e- r-0 1 6 Tu 9:H e- r-1 1 1 Tu 9:H e- r-1 W 19: 6 e d Hr -2 -2 W 0: 1 e d Hr -2 -0 W 0: 2 e d Hr -2 -0 W 0: 7 e d Hr -2 -1 W 0: 2 e d Hr -2 -17 Th 0 :H u- r-2 2 2 Th 1:H u- r-0 2 3 Th 1:H u- r-0 2 8 Th 1:H u- r-1 2 3 Th 1:H u- r-1 21 8 Fr :Hr i -2 -2 3 2 Fr :Hr i -2 -0 4 2 Fr :Hr i -2 -0 9 2 Fr :Hr i -2 -1 4 2 Sa :H r t -2 -1 9 3 Sa : H t -2 r-0 0 3 Sa : H r t -2 -0 5 3 Sa : H t -2 r-1 0 3 Sa : Hr t -2 -1 5 Su 3: H nr2 2 Su 4 :H 0 n - r-0 2 1 Su 4 :H n - r-0 2 6 Su 4 :H n - r-1 2 1 Su 4 :H n - r-1 24 6 :H r-2 1 Weekly query load - hourly Mon-Sun, Oct 18-24 30000 25000 20000 15000 10000 5000 0 Mon-Sun, Oct 18-24 Technical Workshop London – On Line 2004 7 Batch processing times Technical Workshop London – On Line 2004 8 Conflicts www.crossref.org =>Members Area => System Reports => Conflict Report Technical Workshop London – On Line 2004 9 Conflicts =========================================== Created: 2004-10-21 04:38:03.0 Journal title ConfID: 139239 CauseID: 110986773 Metadata used for both DOIs OtherID: 76436491, JT: Scottish Journal of Theology MD: Marsh, 55 ,3,253,2002,In defense of a self: the theological … DOI: 10.1017/S0336930602000313 (139239-null 139291-null ) DOI: 10.1017/S0036930602000315 (139239-null 139291-null ) =========================================== 2 DOIs for the same article DOIs are in a second conflict The state of the conflict ID null => unresolved Technical Workshop London – On Line 2004 10 Conflicts: What to do about them Send us an email instructing how to resolve the conflict Make one DOI prime, all others into aliases Primary DOI DOI to be aliased to primary Conflict IDs 10.1016/j.clindermatol.2003.11.001 10.1016/j.clindermatol.2003.12.026 10.1016/j.clindermatol.2003.12.031 10.1016/S0738-081X(03)00103-2 10.1016/S0738-081X(03)00150-0 10.1016/S0738-081X(03)00153-6 104155 104157 104159 Resolve the conflict without doing anything Conflict ID 101115 103044 103048 105650 Resend in one of the DOIs with new (different) metadata (Soon) login to doi.crossref.org and resolve them yourself Technical Workshop London – On Line 2004 11 Conflicts: prevent them <journal_article publication_type="full_text"> <titles><title>Phys. Rev. A</title></titles> <contributors> <person_name sequence="first" contributor_role="author"> <given_name>Petr O.</given_name> <surname>Fedichev</surname> </person_name> <publication_date media_type="online"> <month>04</month> <year>2004</year> </publication_date> <publisher_item> <item_number item_number_type="sequence-number"> PhysRevA.69.049902 </item_number> </publisher_item> <doi_data> <doi>10.1103/PhysRevA.69.049902</doi> <timestamp>20040412120604</timestamp> <resource>http://link.aps.org/doi/10.1103/PhysRevA.69.049902</resource> </doi_data> </journal_article> Technical Workshop London – On Line 2004 12 Issues Data quality Missing fields (publish ahead of print is OK, but update when data is available) First author being mixed up with other contributors Journal titles Full titles in query without ISSN may cause misses Two recent fuzzy match changes have had an effect 1. Eliminated a dangerous rule the could return false positives when title and ISSN did not match well 2. Lowered the threshold on matching long titles Technical Workshop London – On Line 2004 13 Issues Depositing a new title If you send in 2 files at the same time with DOIs for a new title it may result in two title entries in CrossRef DOIs for journal titles and issues These can be created in the <journal_metadata> and <issue_metadata> tags Page numbers with alpha characters ‘S110’ or ‘110S’ is handled better than ’30-1’ 110 in a query will match S110 or 110S 30 in a query will not match 30-1 10.1016/S0003-4975(02)04151-6 10.1029/2002GL014973 20-a should be Ok ’69F-a’ will only match an exact string Technical Workshop London – On Line 2004 14 Issues Query results in XML format servlet/query?usr=<username>&pwd=<password>& type=<queryType>&format=<resultFormat>&qdata= …. Result format can be: piped, xml, xsd_xml xml is the old legacy XML format (no schema) xsd_xml has a schema and includes all new features http://www.crossref.org/qrschema/crossref_query_output2.0.xsd Use of the legacy XML format should be discontinued Technical Workshop London – On Line 2004 15 Issues Each batch ID / query key combination must be unique. <head> <doi_batch_id>SPI_2004-09-15_09-48-22</doi_batch_id> <doi_data> <doi>10.1007/BF00393374</doi> <resource><![CDATA[ http://www.springerlink.com/index/10.1007/BF00393374 ]]> </resource> </doi_data> <citation_list> <citation key="CR1"> <journal_title>Appl Environ Microbiol</journal_title> <author>RI Amann</author> <volume>56</volume> <citations_diagnostic> <citation key="CR1" status="warning"> A stored query with doi_batch_id=SPI_2004-09-15_09-48-22 and query_key=CR1 already exists for the same depositor Technical Workshop London – On Line 2004 </citation> 16 Issues Upload timeouts Large (250K+) files may not be completing the upload No HTTP response is returned Only a few users seem to be effected (some can upload 1M+ files) Solutions (& work arounds) Break the files up (some are doing 1 DOI per file) CrossRef to investigate session time outs Let me know if your having this problem Technical Workshop London – On Line 2004 17 DX contingency planning CrossRef will be running a secondary Handle system and a DOI proxy resolver. The secondary Handle server receives updates from the DOI primary (at CNRI) about 15 minutes after DOIs are created/updated by CrossRef 1 2 deposit 3 The proxy will share the load going to http://dx.doi.org (DNS subdomain will direct traffic to several IPs) Technical Workshop London – On Line 2004 18 New features Unified query Tracking ID Open Channel Interface Forward linking Local hosting changes Technical Workshop London – On Line 2004 19 Unified query Journals, conf. proceedings and books have different metadata => queries must examine different fields … and it gets worse Proceedings have event name and an event acronym Proceedings have event date and publication date Proceedings and Books have ISBNs and/or ISSNs The real problem is: its hard to tell from a reference what kind of item is being referenced Technical Workshop London – On Line 2004 20 Unified query The solution is to have one query that examines everything and returns the right result Step 1: change the current ‘journal’ query to have the ‘title’ field also examine proceedings event name and event acronym and the ‘issn’ field examine proceedings ISSNs 0277786X|Proceedings of SPIE||4272||133|2001||| 0277786X||Proceedings of SPIE |Srinivasan|4272||133|2001 ||full_text ||10.1117/12.430790 ‘journal’ query: only one title ‘proceedings’ result: two title field (series is empty) Technical Workshop London – On Line 2004 21 Tracking IDs http://doi.crossref.org/servlet/submissionDownload?usr=<USR>& pwd=<PWD>&doi_batch_id=NJ028011-b406513a&type=result OR http://doi.crossref.org/servlet/submissionDownload?usr=<USR>& pwd=<PWD>&file_name=b406513a_doi.xml&type=result Returns the log file http://doi.crossref.org/servlet/submissionDownload?usr=<USR>& pwd=<PWD>&file_name=b406513a_doi.xml&type=contents Returns the XML deposit file Technical Workshop London – On Line 2004 22 Open Channel Interface ‘Premium’ fee has been dropped Available on a case-by-case basis Continuous connection to CrossRef for pipe’d queries Response time can by 10X better than HTTP queries import java.net.*; import java.io.*; Socket socket; PrintWriter out; BufferedReader in; socket = new Socket(host, port); out = new PrintWriter(socket.getOutputStream(), true); in = new BufferedReader(new InputStreamReader(socket.getInputStream())); out.println(qData); String line = in.readLine(); Technical Workshop London – On Line 2004 23 Forward Linking Forward linking deposits are simply the list of references listed in the bibliography You most likely already send this data to CrossRef in the form of queries (in fact reference deposits look very much like queries) There are two ways to deposit references for an article 1. Send them in with the article’s metadata 2. Send them in separately after an article’s DOI and metadata are deposited Technical Workshop London – On Line 2004 24 Technical Workshop London – On Line 2004 25 Forward Linking – deposit log <?xml version="1.0" encoding="UTF-8"?> <doi_batch_diagnostic status="completed"> <submission_id>115276193</submission_id> <batch_id>4219-com.wiley.cch.processes.JournalToDOI16047.xref</batch_id> <record_diagnostic status="Success"> <doi>10.1002/(ISSN)1097-0134</doi> <msg>Successfully updated in handle</msg> </record_diagnostic> <record_diagnostic status="Success"> <doi>10.1002/prot.20276</doi> <msg>Successfully added</msg> <citations_diagnostic> <citation key="10.1002/prot.20276-BIB1" status="stored_query" /> <citation key="10.1002/prot.20276-BIB2" status="resolved_reference">10.1006/jsbi.2001.4428</citation> <citation key="10.1002/prot.20276-BIB3“ status="resolved_reference">10.1110/ps.0227803</citation> <citation key="10.1002/prot.20276-BIB4“ status="resolved_reference">10.1006/jmbi.1990.9999</citation> <citation key="10.1002/prot.20276-BIB5" status="stored_query" /> <citation key="10.1002/prot.20276-BIB6" status="stored_query" /> <citation key="10.1002/prot.20276-BIB7" status="stored_query" /> Technical Workshop London – On Line 2004 26 Forward Linking – query <?xml version = "1.0" encoding="UTF-8"?> <query_batch version="2.0" xmlns = "http://www.crossref.org/qschema/2.0"> <head> <email_address>ckoscher@crossref.org</email_address> <doi_batch_id>fl_001</doi_batch_id> </head> <body> <fl_query alert='false'> <doi>10.1110/ps.0227803</doi> </fl_query> </body> </query_batch> Note: only user ‘coldspring’ can run this query Technical Workshop London – On Line 2004 27 Technical Workshop London – On Line 2004 28 Multiple resolution Multiple resolution presents choices to the user from the site of the link An XML CrossRef deposit sets up the menu and multiple links (sample) Technical Workshop London – On Line 2004 29 Multiple resolution - deposit Normal link is built with the <a> (anchor) tag <a href="http://dx.doi.org/10.5555/sample-doi">The Link Text</a> Multiple resolution link is built with the <script> tag One instance of <script> to load the menu library <script src="http://www.crossref.org/MRLoader/milonic_src.js"> </script> The menu builder code For each link <script src="http://www.crossref.org/MRLoader/MR/ 10.5555/sample-doi?The%20Link%20Text"> </script> The DOI The link text Technical Workshop London – On Line 2004 30 Multiple resolution - deployment Multiple resolution deployment requires three things: 1. 2. 3. Registration of multiple targets for a given DOI Operation of the MRLoader resolver Construction of MR links on Web pages Everyone has a part to play 1. 2. 3. Publishers that ‘own’ the target DOI must implement (or authorize a 3rd party) to register multiple targets CrossRef and/or the content owner publisher must operate the MRLoader resolver Every Web page that links to the MR enabled DOI must replace <a> tags with <script> tags Technical Workshop London – On Line 2004 31 Web Deposit Form Allows users to enter the metadata for a deposit using a Web form. No XML skills required Supports journal articles, now working to add conference proceedings and books. Later, will add reference deposits and components Must know your CrossRef member login www.crossref.org =>Member Area => Member Resources => web deposit form http://www.crossref.org/webDeposit Technical Workshop London – On Line 2004 32 Technical Workshop London – On Line 2004 33 XML Queries • XML Queries provide a more structured format and enable features unavailable in pipe’d queries 1. 2. 3. 4. Enable multiple hits Control over which fields are fuzzy matched Forward linking queries Query match alerts Technical Workshop London – On Line 2004 34 Metadata query <?xml version = "1.0" encoding="UTF-8"?> <query_batch version="1.0" xmlns = "http://www.crossref.org/qschema/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <head> <email_address>name@address.com</email_address> <doi_batch_id>SomeTrackingID2</doi_batch_id> </head> <body> <query key="MyKey1" enable-multiple-hits="false“ forward-match=“false”> <issn>10408746</issn> <journal_title>Current Opinion in Oncology</journal_title> <author>Chauncey</author> <volume>13</volume> <issue>1</issue> •Order is important <first_page>21</first_page> <year>2001</year> •Fields can be omitted </query> </body> </query_batch> Technical Workshop London – On Line 2004 35 Fuzzy match control • Fields with a “match” attribute can be controlled ISSN: optional, exact Journal/Volume: optional, fuzzy, exact (default="fuzzy“) Series Title: optional, null, fuzzy, exact (default="fuzzy“) Author: optional, fuzzy, null, exact (default="fuzzy“) Volume: optional, fuzzy, exact (default="fuzzy") Issue: optional, fuzzy,exact (default="fuzzy“) Page: optional, null,exact (default="optional“) Example: <journal_title match=“exact”>Current Opinion in Oncology</journal_title> Technical Workshop London – On Line 2004 36 A word on special characters • Metadata deposits are supposed to be UTF-8 Unicode é = &#233; (decimal) = &#xE9; (hex) Queries 10.5555/char_test_001 Issn:12345678 Title: Test Publication Author: Joénes Volume: 12 Issue: 1 Page: S125 Year: 1999 <journal_title>Test Publication</journal_title> <author>Joenes</author> <volume>12</volume> <first_page>125</first_page> <year>1999</year> Works because page is supplied <journal_title>Test Publication</journal_title> <author>Joenes</author> <volume>12</volume> <year>1999</year> Does NOT work Arrggghh… <journal_title>Test Publication</journal_title> <author>Jo&#233;nes</author> <volume>12</volume> <year>1999</year> Technical Workshop London – On Line 2004 Works because correct author is supplied 37 Stored Queries • CrossRef remembers queries that do not initially match and sends an email notice when the finally do. <?xml version = "1.0" encoding="UTF-8"?> <query_batch version="1.0" xmlns = "http://www.crossref.o…"> <head> <email_address>ckoscher@crossref.org</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query key="fm_1" enable-multiple-hits="false“ forward-match="true"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> </query> </body> </query_batch> Technical Workshop London – On Line 2004 38 Log message when query is submitted <?xml version="1.0" encoding="UTF-8" ?> <crossref_result version="2.0" xmlns="http://www.crossref.org/qrschema/2.0" …"> <query_result> <head> <email_address>ckoscher@crossref.org</email_address> <doi_batch_id>fm_429_001</doi_batch_id> </head> <body> <query status="unresolved"> <journal_title>Test Publication</journal_title> <author>Anderson</author> <volume>33</volume> <issue>9</issue> <first_page>125</first_page> <year>2002</year> <msg>Query stored in CrossRef for forward matching</msg> </query> </body> </query_result> </crossref_result> Technical Workshop London – On Line 2004 39 Results email Subject: Crossref stored query match: doi_batch_id= fm_429_001 ; query_key= fm_1 <?xml version = "1.0" encoding = "UTF-8"?> <crossref_result version="2.0" xmlns="http://www….-instance… "> <query_result> <head> <email_address>ckoscher@crossref.org</email_address> <doi_batch_id> fm_429_001 </doi_batch_id> </head> <body> <query key=“fm_1" status="resolved"> <doi>10.5555/forward_match_test_2</doi> <issn>12345678</issn> <journal_title match="exact">Test Publication</journal_title> <author match="exact">Smith</author> <volume match="exact">3</volume> <issue>2</issue> <first_page match="exact">100</first_page> <year match="exact">1985</year> <publication_type>full_text</publication_type> </query> </body> </query_result> </crossref_result> Technical Workshop London – On Line 2004 40 Polling for Query Matches • You can interrogate the system to get a list of queries that may have matched. http://doi.crossref.org/servlet/downloadStoredQueries? usr=creftest&pwd=c53test&startDate=2004-0331&endDate=2004-05-03 Technical Workshop London – On Line 2004 41 Forward Linking Queries • Forward linking is an ‘opt-in’ service • Fees: a surcharge on the annual membership • Permission must be enabled by a CrossRef administrator Technical Workshop London – On Line 2004 42 Forward Linking Query Example Reference deposit #1 (log) Reference deposit #2 (log) <?xml version = "1.0" encoding="UTF-8"?> <query_batch version="2.0" xmlns = "http://www.crossref.org/qschema/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crossref.org/qschema/2.0 http://www.crossref.org/qschema/crossref_query_input2.0.xsd"> <head> <email_address>ckoscher@crossref.org</email_address> <doi_batch_id>fl_001</doi_batch_id> </head> <body> <fl_query alert="true"> <doi>10.1097/00001622-200101000-00005</doi> </fl_query> </body> </query_batch> Sample: forward linking query results Technical Workshop London – On Line 2004 43 Forward Linking Alerts • Once you’ve made a forward link query, deposit of any new articles that cite the DOI you requested will generate an alert email Reference deposit #3 (log) Email alert notice Technical Workshop London – On Line 2004 44 New Initiatives Components Extended content types Plans for 2005 Technical Workshop London – On Line 2004 45 Component Deposits What is a component ? Components are considered to be sub-items that are part of the construction of an article, chapter or conference paper or provide supporting (sometimes called supplemental) information. These items by and of themselves are not typically cited in a bibliography, but they are cited within the text NOTE: Title DOIs and Issue DOIs are not components They should be deposited in the journal, conf-proc or Technical Workshop book metadata London – On Line 2004 46 Component Deposits Why create DOIs for components ? To improve link management Build persistent links Use multiple resolution on them How are components deposited Schema version 3.0.3 supports components Deposit as part of an article’s metadata or standalone (note: a parent DOI must be specified) Technical Workshop London – On Line 2004 47 Component Deposits What Component services will CrossRef offer? Near term Just the registration of the DOI Long term Some form of lookup service (e.g. query) Expanded component metadata (licensing, copyright …?) Technical Workshop London – On Line 2004 48 Component Deposits <journal_article> ... <doi_data> <doi>10.9876/S0003695199019014</doi> <resource>http://ojps.aip.org:18000/link/?apl/74/1/76/ab</resource> </doi_data> <component parent_relation="isPartOf"> <description><b>Figure 1:</b> This is the caption of the first figure...</description> <format mime_type="image/jpeg">Web resolution image</format> <doi_data> <doi>10.9876/S0003695199019014/f1</doi> <resource>http://ojps.aip.org:18000/link/?apl/74/1/76/f1</resource> </doi_data> </component> <component parent_relation="isReferencedBy"> <description><b>Video 1:</b> This is a description of the video...</description> <format mime_type="video/mpeg"/> <doi_data> <doi>10.9876/S0003695199019014/video1</doi> <resource>http://ojps.aip.org:18000/link/?apl/74/1/76/video1</resource> </doi_data> </component> Technical Workshop </journal_article> London – On Line 2004 49 Component Deposits Alternatively components may be deposited separately from their ‘parent’ item’s metadata <body> <sa_component> <doi>10.9876/molcell/10/4</doi> <component parent_relation="isPartOf"> <description>Cover Image, Molecular Cell, Volume 10, Issue 4, January 2004 </description> <format mime_type="image/tiff"/> <doi_data> <doi>10.9876/molcell/10/4/cover</doi> <resource>http://molcell.org/10/4/cover</resource> </doi_data> </component> </sa_component> </body> Technical Workshop London – On Line 2004 50 Expanded Content Types Metadata study now underway Dissertations, technical reports, working papers, standards, patents and databases Implementation to occur in early 2005 XML schema will update to version 4.0 Deposits Query services Extend current query mechanism ? ‘Firewall’ current content ? Technical Workshop London – On Line 2004 51 Expanded Content Types Dissertations <dissertation> <person_name> <titles> <acceptance_date> <university> <name> <location> <department> <degree> <publisher_item> <doi_data> Add <advisor> elements Review NDLTD metadata standards Survey T & D organizations Cal Tech ProQuest Texas A&M ? Technical Workshop London – On Line 2004 52 Expanded Content Types Reports <report> <contributors> <titles> <publisher. <publication_date> <publisher_item> <series_metadata> <isbn> <issn> <research_organization> <sponsor> <organization> <contract> <doi_data> Drop ‘technical’ label Support chapters? Survey organizations AGU NASA/JPL Other government? ? Technical Workshop London – On Line 2004 53 Expanded Content Types Working papers <report> <contributors> <titles> <publisher> or <university> <publication_date> <publisher_item> <series_metadata> <isbn> <issn> <research_organization> <sponsor> <organization> <contract> <doi_data> Conflicts with published articles Include series metadata? Survey organizations ? Technical Workshop London – On Line 2004 54 Expanded Content Types Standards Not included in initial analysis, added after annual member meeting Interest from IEEE Metadata draft development TBD Accredited Vs consortium standards Survey organizations Niso, ANSI, BSI, ISO IEEE ConsortiumInfo.org Technical Workshop London – On Line 2004 55 Plans for 2005 Modify / improve page number processing Normalized XML Modularize CrossRef system Implement Expanded Content Types Others Technical Workshop London – On Line 2004 56 Page & article numbers CrossRef deposit schema allows for first page and article number <pages><first_page> <publisher_item><item_number item_number_type="article-number"> Article number will be used if no first page is provided a query has only one ‘page’ field and will search either first_page or article_number but not both Some articles have both: both are presented to the reader change the query logic to search both fields add and XML query field for ‘article_number’ Page numbers (and article numbers?) are not numbers would a full fuzzy match on page improve matching rates? Technical Workshop London – On Line 2004 57 Normalized XML Journal, proceedings and book content is stored in 2 places in the CrossRef database 1. Subset in tables/columns to support query operations 2. Entire deposit as a CLOB (not easily accessed) XML query results are specialized for each content type (<journal_cite><conf_cite><book_cite>) Reduce all content type info to a simpler ‘one size fits all’ schema Store each DOI record as XML in a database column (memo?) Facilitates access to all metadata (e.g. complete ‘lite’ weight local host files) Yield a more consistent XML query result Technical Workshop London – On Line 2004 58 Modularize Current system is a monolith One database supports everything Separate operations to improve performance and scalability Deposits & updates Queries Reports Additional benefits Local host the CrossRef query system, not just the metadata Technical Workshop London – On Line 2004 59 Other Components Query mechanism Expand the metadata (license, rights …) Production implementation of multiple resolution Integrate into the deposit process Implement a local host type option Automatic appropriate copy service Technical Workshop London – On Line 2004 60 Questions / Discussion Technical Workshop London – On Line 2004 61