A Distributed Database and Processing System for

advertisement

A Distributed Database and Processing System for Watermarks: an INTAS Project

Emanuel Wenger (*) , Victor Karnaukhov (#) , Alois Haidinger (+) , Nikolai Merzlyakov (#) ,

Gerard van Tienen (@) , Elena Oukhanova (%) , and Dmitry Erastov (-)

(*) Austrian Academy of Sciences, Commission of Scientific Visualization, Vienna, Austria

E-mail: emanuel.wenger@oeaw.ac.at

(#) Russian Academy of Sciences, Institute for Information Transmission Problems, Moscow, Russia

E-mail: victor.karnaukhov@iitp.ru

, nikolai.merzlyakov@iitp.ru

(+) Austr. Acad. of Sci., Comm.of Paleography a. Codicology of Medieval Manuscripts, Vienna, Austria

E-mail: alois.haidinger@oeaw.ac.at

(@ ) Koninklijke Bibliotheek, The Hague, The Netherlands

E-mail: Gerard.vanthienen@kb.nl

(%) State Historical Museum, Dep. of Manuscripts and Old Books, Moscow, Russia

E-mail: manuscript@shm.ru

(-) Russ. Acad. of Sci., Laboratory of Conservation and Restoration of Documents, St. Petersburg, Russia

E-mail: manuscript@nlr.ru

Abstract

Watermarks are the most important tool for dating undated documents written on paper.

Hence, catalogs and databases of watermarks play an eminent role for the work of medievalist and paper historians. The proposed project has the aim to improve this tool in two directions.

One is the development of a distributed watermark database and processing system, which allows the access to local watermark databases with different structures in different locations and the processing of images from different sources. Such a distributed system will be implemented as server/client system and will be based on the Internet. This will make possible the remote access to watermarks from different collections and sources. The common access to watermarks in Russian and western European libraries will be of major benefit for historians of both sides. Russian libraries keep a lot of manuscripts and early printed books from West Europe. As paper production in Russia started only in the 18 th

century, both sides share the same watermarks. Until now, only a small part of watermarks from Russian collections has been published worldwide so far, based on hand-made tracings.

The second direction is to improve the watermark extraction by new radiographic methods and optical techniques. This will make the production of watermark hardcopies faster and cheaper and lift up their quality significantly.

The project will give a tremendous boost to watermark research in Russia and interesting findings for historians can be expected The project will be implemented by five groups (2

INTAS groups and 3 groups from Russia), which will have access to huge collections of manuscripts and early printed books in Russia, Netherlands and Austria.

Objectives

Comparison of watermarks is the most important and most widely used methodology to date undated paper sources (medieval manuscripts and early printed books). The aim of the project is to lift this important methodology on a new qualitative level by using modern computer and network technology in a multidisciplinary cooperation between historians, computer scientists and radiology physicists.

The main objective is the construction of a digital distributed watermark database, allowing fast retrieval, precise comparison and efficient storage of watermarks. This will be achieved in two stages: firstly, construction of spatially disconnected watermark databases encompassing the local sources and based on the same technological principles and secondly connecting them in a distributed database with modern network technology. Exchange of watermark data from West Europe and Russia is reasonable for both sides because paper production in Russia, started only in the 18 th century and the Russian libraries in Moscow and St. Petersburg contain an important quantity of western manuscripts and printed books due to the collection activities of Russian aristocrats before the revolution.

In parallel to the construction of the distributed database, efforts to improve the extraction

(recording) of watermarks will be made. In the late forties already, Russian scientists from St.

Petersburg have introduced beta radiography for the extraction of watermarks. Continuing experiments with different isotopes, materials, and exposure times will be conducted.

Background & Justification for Undertaking the Project

Manuscripts and early printed books represent an important part of our cultural heritage.

Investigation, cataloging, and restoration of these sources are necessary in order to preserve our heritage for the future. Often, these documents are not dated, but the knowledge about when they were written or printed is essential for historical research. The comparison of dated watermarks with undated ones is the major method for dating undated handwritten (paper) documents and incunabula.

Several printed standard catalogues exist containing thousands of hand-drawn tracings of watermarks. The most important of these watermark collections were published by Briquet [1] and Piccard [2]. The identity of a watermark with one in the standard catalogues is a good indicator for the age of the watermark and hence the document in question.

There are some essential hindrances to the precise dating of the watermarks in a document using the standard catalogues containing hand-drawn tracings of the watermarks. In many cases the watermarks of a document are covered by the written text such that it is impossible to produce good hand-drawn tracings. But even if it is possible to make a perfect sketch of the watermark, it is still time-consuming and tedious to find an identical watermark among hundreds of watermarks of the same motif in the catalogues. Furthermore, all catalogues are incomplete and there is no guarantee that the search for an identical watermark will lead to a result. Additionally, it appears that the proof of the watermark identity is not sufficient for a reliable dating result. The identity of the sieve has to be demonstrated, too.

A way to overcome all these drawbacks is to use computers, which are ideal tools for cataloging, comparing and retrieving of huge amounts of data. Considerable work has been done already in this direction (e.g. see [3-6]).

Watermarks are small deviations in the thickness of paper. Prior to inputting the watermark, a hardcopy made e.g. by hand (as a sketch), rubdown, electron radiography, or betaradiography has to be produced. Betaradiography and electron radiography seem to be the best among the methods for a unique recording of watermarks because of their good quality. Small deviations in the density of the paper are recorded sufficiently well with this procedure. But even these hardcopies have low contrast except some very bright or very dark spots and areas caused by holes in the paper or specific color ink. A drum scanner or a sensitive flatbed scanner with a

transparency extension can be used for scanning. The recordings are scanned, digitally processed, contrast enhanced, and stored in a database as images.

All high-quality watermark acquisition methods (betaradiography, electron radiography) are either time-consuming (exposure time of several hours) or expensive (more than 10 USD per record). One group of the Department of Document Restoration and Conservation of the

Russian Academy of Sciences has been working on watermark recording methods since the end of the forties and introduced betaradiography for hardcopy production of watermarks, a technique which is used nowadays by many historians world wide. Work on progress with different kinds of isotopes and film material have already shown promising results: exposure times of a few minutes with low costs.

Besides the databases of the applicants, there are some other digital watermark databases under development which can be accessed by the WWW: The International Association of

Paper Historians (IPH) has put a provisional version of a watermark database system

( http://www.paperhistory.org/database.htm) for storing watermark data and images according to the IPH Standard to the Web about the year 1996

( http://cuisun8.unige.ch/NSAP/rauber/watermark).

Some dozens of scanned watermarktracings and 99 images of watermark-photographs are included in this database for demonstration purposes only. — At about the same time a group at Bates College, Lewiston

(Maine) started to put images of watermarks in Greek manuscripts on the Web

( http://www.bates.edu/Faculty/wmarchive/ ) . This database is restricted to greek manuscripts only. So far about 200 images of watermarks (all produced in 1995) are accessible. —

Collections of smaller importance are The Thomas L. Gravell Watermark Archive

( http://128.173.125.124:591/DBs/Gravell/default.htm

) and A Digital Catalogue of

Watermarks and Type Ornaments Used by William Stansby in the Printing of The Works of

Beniamin Jonson ( http://jefferson.village.virginia.edu/gants/ ).

All digital watermark databases under development so far are isolated and contain watermarks found only in local libraries or archives. Hence, one main objective is to develop tools, which allow the remote access over the Internet to several different watermark databases and sharing of data between them. Such a distributed watermark database means a tremendous enlargement of the available data and the efficiency of watermark identification and classification, and the accuracy of dating can be significantly increased by this way.

The exchange of watermarks and the access to a common distributed watermark database is of big advantage for historians in Russia and in West Europe. The papers and ergo also the watermarks used until the 18 th

century are the same. Furthermore, Russian libraries have significant holdings of western medieval documents, which were collected by rich Russian aristocrats in the 19 th

century. The institutions with the biggest collections of ancient manuscripts in Russia (Russian National Library, Russian State Library, State Historical

Museum, and Library of the Russian Academy of Sciences) have been chosen to take part in this project. Most of the watermarks in Russia have not been internationally published yet or have been unavailable until now at all.

For computer science, this project contains challenging tasks of research, development, and testing of modern methods for cooperative work as well as the development of image processing methods for identification of similar or identical watermarks. The remote access to data and programs will be realized over the Internet. Digital image processing methods have

to be improved and generalized in order to meet the goals of generality, easy to use and user friendliness.

Project Consortium

A project consortium was established for project preparation and implementation. It consists of the five following research groups:

1.

The coordinator group headed by Emanuel WENGER consists of researches working at the Commission for Paleography and Codicology of Medieval Manuscripts and

Commission for Scientific Visualization (Vienna, Austria) of the Austrian Academy of

Sciences. This group has the most long-term experience in watermark processing along all groups of the project consortium. The group includes historians and computer scientists, which had already in the early 90s the idea to make a digital database for watermarks.

Supported by national grants they made a watermark database and processing system in cooperation with the second group [7-9]. The results of this work were presented on several conferences and have been published. Based on X-ray and beta radiography images the current database has about 4000 entries. All the watermark photographs were taken from medieval manuscripts in Austrian libraries, mainly from manuscripts in the monastery library in Klosterneuburg near Vienna. The database already has been proved many times to be a very useful tool for identification of watermarks and dating of undated documents. Parts of the database are accessible by Internet

(http://www.oeaw.ac.at/ksbm/wz/wwwdb/).

2.

The software and database development and management group consists of computer scientists working in the Institute for Information Transmission Problems of the Russian

Academy of Sciences (Moscow). This group headed by Victor KARNAUKHOV is involved in the watermark activities from the beginning jointly with the first group. These two first groups initiated activity and established the project consortium. They published their joint results in this area together [3-6]. This group includes specialists in digital image processing, database management, and software development. In cooperation with the first group the database system based on Paradox and Borland C++ Builder tools has been designed and implemented. Additionally, a set of processing tools was developed.

3.

The incunabula and watermark research group headed by Gerard van THIENEN consists of incunabulists and researchers working at the Department of Special Collections and

Department of Optical Technics of the Koninklijke Bibliotheek (The Hague, The

Netherlands). This group also has a long-term experience in watermark research. The aim is a narrower dating of the 1200 undated incunabula of the total of 2000 editions printed on paper in the Low Countries (present-day Netherlands and Belgium). The largest collection of LC incunabula is in the Koninklijke Bibliotheek in The Hague (900). Since

1990 there have been made more than 18,000 rubbings of watermarks from one or more copies of 1950 editions in libraries all over the world. Furthermore, 2700 electron radiographs have been made

.

These have been scanned and a watermark database has been constructed. The database contains now almost 2700 images with descriptions and is growing fast (see www.kb.nl/wilc). This group is also involved in the research of incunabula printed in Spain, Germany, France, and England.

4.

The medieval manuscript research group headed by Elena OUKHANOVA consists of historians working in the Department of Medieval Manuscripts of the State Historical

Museum and Russian State Library (Moscow, Russia). The State Historical Museum and the biggest library of Russia have big collections of manuscripts and early printed books.

During the last decades thousands of watermark tracings were made, of which so far only

a small number has been published. This group will provide regular access to the biggest quantity of watermarks in Russia. The data sources provided by this group are Slavonic manuscripts from the XV century. (about 700 ones, which contain nearly 3000–5000 watermarks). Their paper was manufactured in West Europe (mainly in France), so the

“Russian” material is directly correlated with European and can be joined easily with already existing databases of watermarks. Furthermore, this group will investigate the possibility of digital photography with special effects for acquisition of watermark copies.

The Department of Medieval Manuscripts of the State Historical museum is a leader in investigation of medieval watermarks in Russia. This research center prepared and published (in Russian) 5 catalogues of watermarks during the last 20 years.

5.

The watermark copying and acquisition group headed by Dmitry ERASTOV consists of researchers working at the Department of Document Restoration and Conservation of the

Russian Academy of Sciences (St. Petersburg) and the Russian National Library (St.

Petersburg), This group is one of the most advanced groups of watermark research in

Russia. It uses X-radiography and betaradiography for producing hardcopies of watermarks. This group is experienced and very good equipped for development of methods and technology of contactless hard copying of medieval watermarks. An integrated watermark acquisition (based on optical subtraction), processing and expert system has been developed by this group, too.

Scientific and Technical Description of the Project

Development of a distributed database system of watermarks

Analysis of existing printed catalogues and databases of watermarks with the aim to find common features and needs and to define general requirements and recommendations for watermark databases. Recommendations will be elaborated for the structure and data formats of watermark databases. These recommended structures and data formats will be implemented and tested in the databases, which will be developed from the scratch by NIS groups.

Development of relational databases of medieval watermarks, incunabula, and manuscripts in accordance with the elaborated requirements.

Local versions of watermark databases for the Russian groups will be developed in accordance with the requirements. These databases will be used for testing and optimization of the table structures and user interface forms. The development of the databases will be accomplished by the integration of application software tools.

Development of a distributed database and interface system, which allows the mutual access between differently structured databases.

Special user interface forms for user interaction with the distributed database will be developed. These interfaces supported by the application software will be used for the user-friendly formation of complex queries to the distributed database. Also the nonadvanced user will get opportunity to access and share data, accumulated in all local databases involved in the project.

Methods and technology for producing watermark hardcopies from manuscripts and early printed books.

Watermarks cannot be recorded directly from the paper. Watermarks are small variations in the thickness of the paper. They are artifacts in the paper with extremely low contrast.

Specific methods like rubbing, X-radiography, betaradiography, and photography with

special effects and filters produce a replica of the watermark that can be inputted into the computer. As a matter of fact, each researcher of watermarks has to contrive his/her method for producing of watermark. Analysis of existing methods and technology and development of new ones for producing hardcopies of watermarks of medieval manuscripts and incunabula will be done. The known existing methods for hardcopy production should be studied carefully and as result recommendations should be formulated. Safe and improved methods for producing hardcopies of watermarks from medieval manuscripts and incunabula will be the expected result of this task.

In this task the main attention will be given to the betaradiography methods. Preliminary estimations and experiments held by project consortium with beta-source based on an isotope of Technetium-99 demonstrated very promising results. This beta-source uses the metallic Technetium-99 (industrially producible in Russia). A long half-decay of this isotope (more than 200,000 years) guarantees a stable result reproducibility. The very high concentration of the isotope in the source assures a short exposition time of about 3-5 minutes, that is about 50 times shorter than for a typical beta-source.

Digital processing of watermark images.

The watermark processing starts with the input of the watermark hardcopy into a computer.

The visual quality of inputted hardcopies produced by different methods differs significantly and hence, special image preprocessing tools for each kind of hardcopy have to precede its storage in the database.

Development of digital methods for watermark preprocessing. The most important step in the preprocessing stage is image enhancement because in general the images have low contrast and are corrupted by noise. The preprocessing step will yield the output watermark image with a normalized set of visual parameters. Methods and software tools for geometric transformations, noise suppression, image enhancement and normalization will be developed, implemented, and included into the integrated system.

Methods and tools for the computer aided extraction of watermark contours. Watermark contours are mostly used for watermark identification

.

These tracings are published in standard catalogues and, in general, build the basis of catalogues of any watermark researcher. So, it will be a very attractive feature of the proposed system to supply a potential user by easy-to-use software tools for the computer aided extraction of watermark contours from their hardcopies.

Integrated system for database management and watermark processing

All proposed software tools will be integrated in a system. This approach will give the opportunity to the user to perform all necessary tasks within one system. The system will be implemented on a PC-based platform and will run under Windows 95/98/NT-2000 or higher operating system. The interaction of the user with the integrated system will be realized by means of a graphical user interface containing a set of menus, buttons, and other controlling objects. This graphical user interface will be designed in the Windows-style because it is the most widely used.

Population of the distributed database.

The procedure of population of all local developed databases will start after completion of

T1.2 and will be done in a regular manner. This is routine work, but it is the most timeconsuming stage. Digital copies of the watermarks have to be made, all its metric, chronological and other characteristics have to be fixed and also the general data about the manuscript, where the watermark has been found, have to be given.

Exploitation & Dissemination of Results

For dissemination of results, it is planned to create a multilingual Web-site with one mirror in

Moscow to simplify the access for Russian researchers. This Web-site will disseminate the information about the project, will allow a worldwide access to the distributed database via

Internet by registered users, and will encourage other groups to participate and include their watermark data into the distributed system. A long-term goal of the project is to create an All-

European watermark database system of watermarks.

Acknowledgements

This work is sponsored by INTAS as project INTAS 00–00081. Parts of the work are supported by the Russian Foundation for Basic Research under project number 01-07-90354 and by the Austrian Science Fund as grant FWF-Projekt P13298-ARS.

References

1.

Briquet Ch. M.,Les Filigranes. Dictionnaire historique des marques du papier dés leur apparition vers 1282 jusqu’en 1600. Paris 1907.

2.

Piccard G., Die Wasserzeichenkartei Piccard im Hauptstaatsarchiv Stuttgart. Findbuch

I-XV, W. Kohlhammer, Stuttgart, 1966-1987.

3.

Wenger E., Karnaukhov V., Merzlyakov N., Haidinger A., A Digital Image and

Database System of Medieval Watermarks. In: Leonid Kujbyshev and Nadezhda

Brakker, editors, EVA '99 Proceedings New Information Technologies in the Cultural

Area in the New Millenium, pp.321-324, Moscow, 1999.

4.

Karnaukhov V.N., Wenger E., Haidinger A., Merzlyakov N.S., Zhang Y.J., An

Integrated System for Digital Processing and Identification of Watermark Images. First

International Conference on Image and Graphics, August 16-18, Tianjin, China, 2000, p.119-122.

5.

Karnaukhov V. N., Wenger E., Merzlyakov N. S., Haidinger A., and Lackner F.,

Thematic processing and retrieving of watermarks. Proceedings of SPIE Vol. 2363,

1995, pp.32-39;

6.

Karnaukhov V. N., Merzlyakov N. S., Wenger E., Haidinger A., and Lackner F., Digital

Analysis of Watermarks of Medieval Manuscripts. Computer Optics, Vol.14-15, 1995, pp.11-24 (in Russian).

7.

Project: BWK0028/BMWFK – OWCR38 “Wasserzeichen-Repertorium als Text-

Graphik Datenbank” in the framework of the Ost-West Program supported by the

Austrian Ministry of Science and Research (1991-1993).

8.

Project: P-11549-HIS “Watermarks of the medieval manuscripts in Klosterneuburg”

(“Die Wasserzeichen in den mittelalterlichen Handschriften des Stiftes

Klosterneuburg”) supported by the Austrian FWF (Foundation for Scientific Research)

(1995-1998). Cf. http://www.oeaw.ac.at/ksbm/wz/fwf11549.htm

9.

Project: P-13298-HIS “Watermarks of manuscripts in Klosterneuburg”

(“Wasserzeichen Klosterneuburger Handschriften”) supported by the Austrian FWF

(Foundation for Scientific Research) (1999/2002). Cf. http://www.oeaw.ac.at/ksbm/wz/fwf13298.htm.

Download