SRU Word document

advertisement

Archives Hub APIs - SRU

SRU stands for Search and Retrieve via URL. It's a XML focused, RESTful protocol / webservice for searching online databases across the internet that evolved from Z39.50.

Available operations are:

 Explain

 Search/Retrieve

 Scan

Requests are encoded as URLs. Parameters and their values are case sensitive, and should be encoded in the 'query' portion of the URL (i.e. towards the end, after a question mark ? but before a hash #).

The base URL for the Archives Hub SRU service is: http://archiveshub.ac.uk/api/sru/hub

Explain

The explain operation allows us to ask the server to explain its own capabilities, including:

 Indexes available to search

 XML Schemas the data is available in

 What extensions, if any, are supported

Required parameters are:

Parameter name

Description Example operation The operation to carry out, in this case 'explain' explain version The highest version of the protocol supported by the client

1.2

The information that will be returned by an explain request to the Archives Hub SRU server is summarized here in human-friendly format for convenience.

Indexes

None of the indexes in the Archives Hub are case sensitive.

The term 'keywords' indicates that words have been extracted and reduced to their linguistic stems, meaning they lack variable endings that might indicate plurals, possession, or verb conjugation – i.e. nurse = nurses = nurse's = nursing

For subject, name and date indexes, the values are dependent upon their being tagged within a record by the cataloguer – the database does not carry out Natural Language Processing to identify these within prose at this time.

The exact relation means an exact match to the complete value recorded in the index, though wildcards are allowed.

The equals sign is a bit of a magic operator. In the Anywhere, Description and Title indexes it will search for phrases, elsewhere it is synonymous with exact.

CQL Index Name Available Relations Description of values rec.identifier exact

= cql.anywhere dc.description dc.title any all

= any all

= exact all any

=

Internal identifiers for each record. The values in this index are those used to generate persistent unique

URLs for each of the descriptions.

All keywords from all records, regardless of their position within records. = means search for a phrase in this index.

Keywords from specific areas of records that give a good representation of what the records is about. This includes titles, subjects and description of the scope and content of the collect/item in question. = means search for a phrase in this index.

Precise titles and keywords from titles. Using exact will search for the full and precise title (wildcard are permitted), whereas the other relations will search for keywords, = meaning search for a phrase. dc.identifier exact

= any all

Unit identifier, or reference number assigned to a collection or item by the cataloguer. Using the any or all relations will match partial identifiers, assuming that they are separated by a non alpha-numerical character.

The name of the creator of the collection or item, as recorded by the cataloguer. dc.creator dc.subject bath.name exact

= any all bath.personalName exact

= any all exact

= any all exact

= any all

Subjects or topics, as assigned by the cataloguer.

Names of things, people, organizations or places.

Names of people.

bath.familyName exact

= any all bath.corporateName exact

= any all bath.geographicName exact

= any all bath.genreForm exact

= any all dc.date rec.creationDate =

<

<=

>

>= exact

=

<

<=

>

>= any all within encloses overlaps

>=< rec.lastModifiedDate =

<

<=

>

>=

Names of families (surnames)

Names of any organizations, corporations or groups.

Names of places, towns, regions, countries etc.

Types of media represented in the collection or item, e.g. photographs, audio recordings etc.

Significant dates, most commonly the date of creation of the material.

The date and time at which the record was inserted into the database. Please note that this is not the same as the date the EAD description was created, nor is it guaranteed to remain unaltered; occasionally it may be necessary to recreate the indexes, which will result in the record creation time being updated.

The date and time at which the index entries for the description were last updated. Please note that this is not necessarily the same as the date the content of the record was modified, nor does it guaranteed that the record was actually altered at this time; occasionally it may be necessary to reindex, which will result in the last modification time being updated, as it is not practical to test every record for the presence of actual modifications.

vdb.identifier exact

= all any exact

=

Identifiers for the various contributors to the Archives

Hub. This can be used to narrow a search to 1 or more contributors, if you know the identifier for them. ead.istoplevel Values in this index are all 1. This index is used as a filter to discriminate collections from the items contained within them

Record Schemas

The Archives Hub can return records in the following schemas

Short name(s) URI ead dc, srw_dc oai_dc info:srw/schema/1/ead-2002 info:srw/schema/1/dc-v1.1 http://www.openarchives.org/OAI/2.0/oai_dc

/

Description

EAD 2002 – DTD Version

Simple Dublin Core Elements

(inside an srw_dc wrapper)

Simple Dublin Core Elements

(inside an oai_dc wrapper)

Search/Retrieve

The Search and Retrieve operation is really the main operation in SRU – it's the one used to actually search the database and retrieve matching records.

Require parameters are:

Parameter name

Description Example operation version

The operation to carry out, in this case

'searchRetrieve' searchRetrieve

The highest version of the protocol supported by the client

1.2 query The query to execute. The query should be specified in CQL. CQL is a separate but related standard syntax. This example says: show me records which have "money" in the subject, and sort the results by relevance.

More information, including a useful walkdc.subject all/relevant "money"

through of the different parts of a query in CQL, can be found on the SRU website: http://www.loc.gov/standards/sru/specs/cql.ht

ml

Optional parameters for the search/Retrieve operation:

Parameter Name Description Example(s) maximumRecords The maximum number of matching records to be returned. Must be greater than or equal to 0. The server may return fewer records than this, for example if there were fewer matches, but should never return more than this number.

20

100 startRecord recordSchema

The position within the resultSet of the first record to be returned. Must be greater than or equal to 1. The default value is 1. This parameter enables pagination of results.

21

101

The schema in which the records must be returned. Value may be the URI identifier of the schema, or the short name, as specified in the schemaInfo section of the explain response. info:srw/schema/1/ead-2002 ead

Scan

The Scan operation is used to browse though the search indexes in alphabetical order (note that this can lead to some confusion with numbers, and dates – 100 comes before 20, and 2011 comes before

300 BC when sorted alphabetically).

Required parameters:

Parameter name Description Example operation version

The operation to carry out, in this case 'scan' Scan

The highest version of the protocol supported by the client

1.2 scanClause This expresses the index to scan and the point within it to begin, expressed as a complete index, relation,term clause in CQL (see query paramter of searchRetrieve operation) This bath.personalName exact smith

example says: find the point in the personal names index where 'smith' does or will occur.

Optional parameter for scan:

Parameter name Description Example maximumTerms The maximum number of terms to be returned. The server may return fewer terms than this, for example if the end of the index is reached, but should never return more than this number.

100 responsePosition The position in the return list of terms where the specified term should occur. This must be greater than or equal to 0 and less than or equal to maximumTerms + 1. 0 meaning that the returned list should start immediately following the specified point, maximumTerms

+ 1 meaning that the returned list should finish immediately before it. Notice that this means it is possible to scan backwards!

50

Download