CS 430: Information Discovery Non-Textual Materials 1 Lecture 21 1

advertisement

1

CS 430: Information Discovery

Lecture 21

Non-Textual Materials 1

2

Course Administration

Discussion classes

• Attend!

• Speak!

Assignment 2, queries

Mail has been sent to everybody. Contact cs430 if you have any outstanding questions.

3

Examples

Content maps photograph bird songs and images software data set video

Attribute lat. and long., content subject, date and place field mark, bird song task, algorithm survey characteristics subject, date, etc.

4

Surrogates

Surrogates for searching

• Catalog records

• Finding aids

• Classification schemes

Surrogates for browsing

• Summaries (thumbnails, titles, skims, etc.)

5

Catalog Records for Non-Textual

Materials

• General metadata standards, such as Dublin Core and MARC, can be used to create a textual catalog record of non-textual items.

• Subject based metadata standards apply to specific categories of materials, e.g., FGDC for geospatial materials.

• Text-based searching methods can be used to search these catalog records.

6

Example 1: Photographs

Photographs in the Library of Congress's American

Memory collections

In American Memory, each photograph is described by a

MARC record.

The photographs are grouped into collections, e.g., The

Northern Great Plains, 1880-1920: Photographs from the Fred

Hultstrand and F.A. Pazandak Photograph Collections

Information discovery is by:

• searching the catalog records

• browsing the collections

7

8

9

10

Photographs: Cataloguing Difficulties

Automatic

• Image recognition methods are very primitive

Manual

• Photographic collections can be very large

• Many photographs may show the same subject

• Photographs have little or no internal metadata (no title page)

• The subject of a photograph may not be known

(Who are the people in a picture? Where is the location?)

Photographs: Difficulties for Users

11

Searching

• Often difficult to narrow the selection down by searching -browsing is required

• Criteria may be different from those in catalog (e.g., graphical characteristics)

Browsing

• Offline. Handling many photographs is tedious. Photographs can be damaged by repeated handling

• Online. Viewing many images can be tedious. Screen quality may be inadequate.

Example 2: Mathematical Software

12

Netlib

• A digital library that of mathematical software (Jack

Dongarra and Eric Grosse).

• Exchange of software in numerical analysis, especially for supercomputers with vector or parallel architectures.

• Organization of material assumes that users are mathematicians and scientists who will incorporate the software into their own computer programs.

The collections are arranged in a hierarchy. The editors use their knowledge of the specific field to decide the method of organization.

Multimedia 3: Geospatial Information

13

Example: Alexandria Digital Library at the University of

California, Santa Barbara

• Funded by the NSF Digital Libraries Initiative since 1994.

• Collections include any data referenced by a geographical footprint. terrestrial maps, aerial and satellite photographs, astronomical maps, databases, related textual information

• Program of research with practical implementation at the university's map library

14

Alexandria User Interface

15

Computer Systems and User

Interfaces

Computer systems

• Digitized maps and geospatial information -- large files

Wavelets provide multi-level decomposition of image

-> first level is a small coarse image

-> extra levels provide greater detail

User interfaces

• Small size of computer displays

• Slow performance of Internet in delivering large files

-> retain state throughout a session

16

Alexandria: Information Discovery

Metadata for information discovery

Coverage: geographical area covered, such as the city of

Santa Barbara or the Pacific Ocean.

Scope: varieties of information, such as topographical features, political boundaries, or population density.

Latitude and longitude provide basic metadata for maps and for geographical features .

17

Special Purpose Systems

Many non-textual collections have developed special purpose methods for organizing materials and for information discovery.

Finding Aids and the EAD

F inding aid

• A list, inventory, index or other textual document created by an archive, library or museum to describe holdings.

• May provide fuller information than is normally contained within a catalog record or be less specific.

• Does not necessarily have a detailed record for every item.

18

The Encoded Archival Description (EAD)

• A format (XML DTD) used to encode electronic versions of finding aids.

• Heavily structured -- much of the information is derived from hierarchical relationships.

19

GAMS: Guide to Available

Mathematical Software

20

Gazetteer

Gazetteer : database and a set of procedures that translate representations of geospatial references: place names, geographic features, coordinates postal codes, census tracts

Search engine tailored to peculiarities of searching for place names .

Research is making steady progress at feature extraction , using automatic programs to identify objects in aerial photographs or printed maps -- topic for long-term research.

21

Collection-Level Metadata

Collection-level metadata is used to describe a group of items.

For example, one record might describe all the images in a photographic collection.

Note: There are proposals to add collection-level metadata records to Dublin Core. However, a collection is not a document-like object.

22

Collection-Level Metadata

23

Direct Searching of Content

Sometimes it is possible to match a query against the content of a digital object. The effectiveness varies from field to field.

Examples

• Images -- crude characteristics of color, texture, shape, etc.

• Music -- optical recognition of score

• Bird song -- spectral analysis of sounds

• Fingerprints

24

Image Retrieval: Blobworld

25

Data Mining

• Extraction of information from online data.

• Not a topic of this course.

Download