Unlock the books with IntelligentCAPTURE

advertisement

Unlock the books with

IntelligentCAPTURE

Xavier Baumgartner

University of St. Gallen

Outline

• 1 Background of the Project:

– Euregio Bodensee - Library Cooperation

– Project AGI and VLB = Vorarlberger Landesbibliothek

– IBH = Internationale Bodenseehochschule

• 2 Project Partners:

– AGI: http://www.agi-imc.de/

– Libraries

Outline

• 3 Project Tools:

– intelligentCAPTURE

– IC CAI-Engine

– intelligentSEARCH

• 4 Project Results:

– Library catalogue: http://www.vorarlberg.at/vlb/

– Portal: http://www.dandelon.com

1 Background

Euregio Bodensee

- Region extending for roughly 50km around Lake Constance

(Bodensee)

- Covers the southern German districts of Konstanz,

Sigmaringen, Ravensburg, Lindau, and Oberallgäu und

Bodenseekreis

- Austrian province of Vorarlberg

- Swiss cantons of St. Gallen, Schaffhausen, Appenzell-

Innerrhoden and Appenzell-Ausserrhoden

- Principality of Liechtenstein.

1 Background

Euregio Bodensee - Library Cooperation

http://www.ub.uni-konstanz.de/euregio/bodkat.htm

http://www.ub.uni-konstanz.de/boddb/

1 Background

IBH =

I

nternationale

B

odensee-

H

ochschule

International Lake Constance University

- Virtual University

- Network of 24 independent universities

- Aim: promote cooperation among member universities in fields of science, research and infrastructure

- Use synergies to mutual advantage

2 Project Partners

AGI - Information Management

Consultants

- Focused on information and knowledge managment

- Consulting

- Software development and long-term maintenance

- Use advanced recognition technologies in:

Automatic indexing and text mining (CAI)

Machine translation (MT)

Optical character recognition (OCR)

Recognition of text structures in PDF documents

Voice recognition

2 Project Partners

AGI - Information Management

Consultants

Products:

- based on IBM technical platform Lotus Notes & Domino

- intelligentCAPTURE -> tool for document capturing and machine indexing

- IC INDEX -> tool for developing topic maps, taxonomies, thesauri and classifications

- intelligentSEARCH -> tool for information retrieval, vizualization

2 Project Partners

Libraries

- University of Applied Sciences Dornbirn

- University of Applied Sciences Kempten

- University of Applied Sciences Liechtenstein

- Central Library Zurich for University Zurich

- University of Applied Sciences Konstanz

- University of St. Gallen

3 Project tools

intelligentCAPTURE

- Software intelligentCAPTURE installed locally and connected to scanner

- Workflow:

- Identification of document via barcode

- Scanning table of contents of books

- Character recognition process (OCR)

- Quick check of result of OCR

3 Project tools

intelligentCAPTURE

- Workflow (cont):

- Generation of PDF file

- Compression of files

- Automatic indexing (CAI engine)

- Transfer of PDF file to file system

- Export of indexing results and PDF files to Local library system to Local intelligentSEARCH database to Central database, hosted by AGI

3 Project tools

IC CAI Engine

- Automatic indexing much more specific and comprehensive than just indexing of title and intellectual indexing with controlled vocabulary

- Document analysis on basis of linguistic methods and procedures from computer linguistics

- All words are reduced to linguistic base form (morphems)

- Uses large semantic nets (thesauri, topic maps etc.)

- Statistical rules for relevance ranking

3 Project tools

IC CAI-Engine

- Output of most important terms in groups:

- geographical terms

- personal/corporate terms

- branches areas of activity

- decriptors: words from internal thesaurus

- important words and phrases from text

- Libraries: use broad generic thesaurus, approx. 300‘000

German terms and smaller English thesaurus

- Languages: German and English in use, French and Spanish available

Library1 iCAPT ILS

Library 2 iCAPT ILS

Library 3 iCAPT ILS

Indexing PDF Indexing PDF

Indexing PDF

AGI

3 Project tools

intelligent SEARCH

- Search engine, simple (Google like) interface, with IBM

GTR (Global Text Retrieval) as core engine

- Search terms input -> automatically expanded semantically

- Main features of GTR:

Operators: Boolean, adjacency, near, paragraph sentence, right and left truncation, wildcard, fuzzy searching, sorting by relevance

3 Project tools

intelligent SEARCH

- AGI developed features:

- Highlighting

- Interfaces to library system, book seller, web via google

- Query expansion by semantic nets

- Vizualization and browsing of topic maps

4 Project Results

Project Results

- Library OPAC Vorarlberger Landesbibliothek:

- Portal: http://vlb-katalog.vorarlberg.at

www.dandelon.com

4 Project results

www.dandelon.com

- Portal with semantic search engine (intelligentSEARCH)

- Content: automatically indexed content pages of books and other publications; PDF files of contents pages

- Search terms expanded semantically

- Relevance ranking

- Highlighting

4 Project results

www.dandelon.com

- Links to libraries holding the book, to booksellers, to internet search engines

- View topic maps

Download