• 1 Background of the Project:
– Euregio Bodensee - Library Cooperation
– Project AGI and VLB = Vorarlberger Landesbibliothek
– IBH = Internationale Bodenseehochschule
• 2 Project Partners:
– AGI: http://www.agi-imc.de/
– Libraries
• 3 Project Tools:
– intelligentCAPTURE
– IC CAI-Engine
– intelligentSEARCH
• 4 Project Results:
– Library catalogue: http://www.vorarlberg.at/vlb/
– Portal: http://www.dandelon.com
1 Background
- Region extending for roughly 50km around Lake Constance
(Bodensee)
- Covers the southern German districts of Konstanz,
Sigmaringen, Ravensburg, Lindau, and Oberallgäu und
Bodenseekreis
- Austrian province of Vorarlberg
- Swiss cantons of St. Gallen, Schaffhausen, Appenzell-
Innerrhoden and Appenzell-Ausserrhoden
- Principality of Liechtenstein.
1 Background
http://www.ub.uni-konstanz.de/euregio/bodkat.htm
http://www.ub.uni-konstanz.de/boddb/
1 Background
I
B
H
- Virtual University
- Network of 24 independent universities
- Aim: promote cooperation among member universities in fields of science, research and infrastructure
- Use synergies to mutual advantage
2 Project Partners
- Focused on information and knowledge managment
- Consulting
- Software development and long-term maintenance
- Use advanced recognition technologies in:
Automatic indexing and text mining (CAI)
Machine translation (MT)
Optical character recognition (OCR)
Recognition of text structures in PDF documents
Voice recognition
2 Project Partners
Products:
- based on IBM technical platform Lotus Notes & Domino
- intelligentCAPTURE -> tool for document capturing and machine indexing
- IC INDEX -> tool for developing topic maps, taxonomies, thesauri and classifications
- intelligentSEARCH -> tool for information retrieval, vizualization
2 Project Partners
- University of Applied Sciences Dornbirn
- University of Applied Sciences Kempten
- University of Applied Sciences Liechtenstein
- Central Library Zurich for University Zurich
- University of Applied Sciences Konstanz
- University of St. Gallen
3 Project tools
- Software intelligentCAPTURE installed locally and connected to scanner
- Workflow:
- Identification of document via barcode
- Scanning table of contents of books
- Character recognition process (OCR)
- Quick check of result of OCR
3 Project tools
- Workflow (cont):
- Generation of PDF file
- Compression of files
- Automatic indexing (CAI engine)
- Transfer of PDF file to file system
- Export of indexing results and PDF files to Local library system to Local intelligentSEARCH database to Central database, hosted by AGI
3 Project tools
- Automatic indexing much more specific and comprehensive than just indexing of title and intellectual indexing with controlled vocabulary
- Document analysis on basis of linguistic methods and procedures from computer linguistics
- All words are reduced to linguistic base form (morphems)
- Uses large semantic nets (thesauri, topic maps etc.)
- Statistical rules for relevance ranking
3 Project tools
- Output of most important terms in groups:
- geographical terms
- personal/corporate terms
- branches areas of activity
- decriptors: words from internal thesaurus
- important words and phrases from text
- Libraries: use broad generic thesaurus, approx. 300‘000
German terms and smaller English thesaurus
- Languages: German and English in use, French and Spanish available
Library1 iCAPT ILS
Library 2 iCAPT ILS
Library 3 iCAPT ILS
Indexing PDF Indexing PDF
Indexing PDF
AGI
3 Project tools
- Search engine, simple (Google like) interface, with IBM
GTR (Global Text Retrieval) as core engine
- Search terms input -> automatically expanded semantically
- Main features of GTR:
Operators: Boolean, adjacency, near, paragraph sentence, right and left truncation, wildcard, fuzzy searching, sorting by relevance
3 Project tools
- AGI developed features:
- Highlighting
- Interfaces to library system, book seller, web via google
- Query expansion by semantic nets
- Vizualization and browsing of topic maps
4 Project Results
- Library OPAC Vorarlberger Landesbibliothek:
- Portal: http://vlb-katalog.vorarlberg.at
www.dandelon.com
4 Project results
- Portal with semantic search engine (intelligentSEARCH)
- Content: automatically indexed content pages of books and other publications; PDF files of contents pages
- Search terms expanded semantically
- Relevance ranking
- Highlighting
4 Project results
- Links to libraries holding the book, to booksellers, to internet search engines
- View topic maps