Streaming resources

advertisement

Streaming resources – a third class of continuing resources

Some Internet documents have properties unknown in traditional documents, or even in digital documents stored on physical media: At the item level, ”streaming” data types has been in use for a few years. We think that streaming resources should be considered in a wider framework. In particular, it should be related to the four-level FRBR model.

Streaming resources characteristics

New information is supplied continously – not in discrete numbered/dated (or otherwise identified) distinct new parts like serials.

The new information is not added to a continously growing whole like a serial – there is no whole to add it to.

Nor will the information be integrated into a whole like an integrating resource – at any moment, the current information completely replaces any previous information, and is immediately replace by subsequent information: The content is transient (ephemeral).

An access to the resource (i.e. at the item level) will return document contents distinct from any previous or succeeding result.

These considerations apply to e.g. web cameras, Internet radio/TV, or (most) web newspapers. Web camera and TV images may be updated every 1/30 or 1/25 sec, sound sources from every 1/8000 sec to every 1/44100 sec, web newspapers every few seconds.

Although the update frequency of the latter is sufficiently low to be viewed as ”discrete”, their nature is that of continous updating.

A sound file that can be retrieved and played on demand is not a streaming resource: A previous or later access to the same on-demand-resource will return the same file.

Any (digital or analog) conventional radio or TV channel qualifies as a streaming resource.

Gray zones

The value of a streaming resource is generated (rather than supplied) on demand. There is a gray zone: Some resources items are generated on demand from databases that may have been modified since the previous item was generated, but frequently the returned item will be identical to the previously generated one. Infrequently updated resources may appear either as integrating resources or serials, depending on whether updates replace old information or is added.

While there are gray zones, some resources are by their very nature continously updated rather than in distinct well defined increments. Drawing the exact borderline will in some cases involve individual judgement, based on the assumed intention of the publisher.

However, it seems quite clear to us that the intention of most web newspaper publishers is to provide a continously updated newspaper, rather than in distinct, well defined increments.

Do streaming resources require any special treatment?

Yes.

The expression no longer consist of a well defined set of symbols/signs/semantic elements/ words/..., but a continous stream of such elements. The creative effort performed by the author of a traditional, persistent expression, such as a novel, consists of selecting specific words to express his work – for a streaming resource such as a web newspaper, the creative effort done by an editor of a web newspapers consists of managing a contious stream of news stories from various sources by selecting relevant entries, supplying priorities for presentation on the web page and similar decisions adding semantic information to the contents etc. This is a continous, ongoing task, in which the creative element can be considered a way of shaping the information stream, as much as determining its contents. (In other kinds of streaming media, there may a stronger element of determining the contents of the information.)

The manifestation level cannot specify the exact rendering of given contents, as this contents is not available. Rather, it must specify a set of rules for how to obtain the contents (e.g. SQL statements to access a continously changing database, or which sound/video source to access) and presentation rules through a template (e.g. an HTML page prototype) providing semantics independent parameters for the on-the-fly generation of an item. Both the contents generation rules and the presentation rules are applied whenever an item is requested.

In traditional media, there may be many manifestations of the same expression. The same can be seen in streaming media: A web radio may be available in low, medium and high quality (selected by the user depending on available bandwidth), stereo or mono etc.

An item request generates a unique item. The presentation template at the manifestation level determines the format, and may also define a section of the information stream in the item: A pay channel may be restricted to 30 seconds of listening/viewing to non-subscribers.

Preserving streaming resources

Archiving the complete information of a streaming resource item requires continuous capturing of the resource whenever new information is streamed. (Many sources take breaks at night; this does not principally change the streaming resource property.)

In some cases, the actual capture method may be e.g. by offline delivery of recorded information. The National library of Norway uses such methods both for broadcast radio programs and for a number of web newspapers. This may affect the format – e.g. for newspaper offline delivery, the presentation template is not applied. So, in addition to the complete capture of the web newspaper information, we capture daily snapshots from the web to preserve the result of applying the manifestation level presentation template.

The information density of some streaming resources are not sufficient to justify a complete preservation: Many web cameras simply display e.g. an image of a street (usually one that can be viewed from the window of the publisher’s facilities). We may consider preserving samples of such streams at regular or irregular intervals, but resource limitations usually prevent the preservation of the complete stream.

In the general case, we cannot expect to be able to preserve all available streaming resources completely; complete preservation will be restricted to special cases.

So, should we at all consider streaming resources for cataloguing?

The fact that we cannot (in the general case) keep a copy of the resource in-house is an argument against cataloguing streaming resources: We catalog only our holdings – the resources we have to in-house.

But this isn’t strictly true: A new serial may be catalogued based on a single first issue, which for a weekly publication constitues a mere 2% of the contents of the first year of publication!

As the years go by, cataloguing is based on a vanishingly small fraction of the information supplied by the resource. Basing a catalog entry on a small sample of the current information of a streaming resource, available at cataloguing time, would not be principally different.

A principal difference between serials and streaming resources is that for the latter, previous information is not available – the user can only access the current, not older information.

Howeverl streaming resources share this property with integrated resources: A library usually provides only the fully updated version of a loose-leaf resource, and cannot (easily) reconstruct the integrating resource as it appeared at some earlier time.

An increasing number of traditional newspapers are moving into the web era. Radio, TV and streaming Internet media are gradually taking over parts of the information function earlier provided by static printed resources. Ignoring these information sources would be to say that we are not willing to manage essential information channels for the young generation. This would be unfortunate, to say the least.

Where do we go from here?

Continuing resources already defines two main classes: Serials and integrating resources.

Some step has been taken to mold the streaming resources into one of these classes, but even these early efforts, considering only the very simplest cases (e.g. databases updated rather infrequently) leads to a number of problems.

We believe that recognizing streaming resources as a third class will relieve both the serial and integrating resource classes from a lot of problems.

The next step will be to clarify the borderlines between the classes, defining when a resource is considered streaming rather than integrating, and rather than a serial.

Once streaming resources have been well defined, the properties of such resources, at different abstraction levels, can be defined. This could be more difficult than expected:

Contrary to traditional resources, described by the properties of the result of a creative (at the expression level) or production (at manifestation/item level), a streaming resource must be described by process parameters for generating the result on the fly.

The Paradigma project at the National library of Norway has taken the first steps in this direction, by defining the concepts of a dynamic document at the expression level, and their exemplifications in the form of specific documents at the item level. We have been able to incorporate these into the FRBR model with only minor modifications to the interpretation of existing concepts. Yet, we recognize that further refinements of the concepts are needed, and detail adjustment of the concepts, and possibly in the FRBR model, may be required to harmonize streaming resources properties with the FRBR fundament.

Ketil Albertsen, The National Library of Norway

Jan 16 th , 2004

Download