Page-image recipe databases, a new approach for accessing art technological manuscripts and rare printed sources: the Winsor & Newton archive prototype Mark Clarke and Leslie Carlyle Instituut Collectie Nederland Gabriël Metsustraat 8 1071 EA Amsterdam The Netherlands E-mail: mark.clarke@icn.nl; leslie.carlyle@icn.nl Abstract This paper describes a new database approach for making the contents of a documentary source widely accessible without the need for exhaustive transcription or complex editing. The electronic availability of historic recipes greatly facilitates correlation between the recipes and analytical results from historic art works, thus serving conservators and conservation scientists, as well as art historians and curators. The database incorporates full page images from the primary source, alongside an index and summary of individual page contents. This removes the problems of full-text entry, and allows the rapid generation of indices. As the original page is always visible to the user, subsequent researchers are not restricted to interpretations within the database entries. A fully functional pilot database is being built for the 19th century archive of Winsor & Newton, which contains recipes and processes for making oil painter’s materials. Keywords database, archive digitization, colourman, recipe, oil paint, watercolour, 19th century, England Introduction Many manuscripts covering art technology survive from all periods, such as artists’ recipes, or descriptions and diagrams of technical or manufacturing processes. A large proportion of these are unpublished, and consequently almost wholly unknown or inaccessible. Interpretation of the texts Preparing transcriptions of manuscripts or rare printed sources is slow and problematic, and printed editions are expensive to produce. However carefully the editor is faithful to the original, some interpretation is always necessary: conjectural readings must be provided for missing or illegible passages and for abbreviations and symbols. The language of such old texts is often obscure or obsolete, further complicating the reading. With manuscripts that have been added to and modified over time, it is awkward to decide which parts of the text are later additions or annotations. Such clues as crossing-out, pasted in slips, marginal or interlinear notes, pencilled additions, changes in handwriting, or passages in the same hand but different ink can be essential for working out the original text, as well as how it was used. This information is lost in a printed edition. Therefore, in the past, indexing and interpretation of manuscripts have involved long, complex, slow transcriptions by an editor familiar with the subject, language and handwriting of the period. The result, however-sophisticated, remains an interpretation fixed in time. As knowledge of a field increases, original interpretations require revision. Indeed, the history of researching and editing manuscripts is one of repeatedly returning to the original for reconsideration. Accessibility With important manuscripts (and printed books) the solution for creating wider accessibility has often been to publish a facsimile. This should be straightforward, requiring only good photography and printing, but unfortunately quality printing, especially in colour, is very expensive. The much reduced quality of inexpensive editions, microfilms or photocopies make close reading of complex pages difficult at best. Furthermore manuscript reproductions require each reader to be his or her own editor. With individual manuscripts and whole archives, there is no way to easily find material relevant to any specific enquiry other than to struggle through page by page. The solution is to make the original pages accessible and to prepare a detailed index. This has been successfully done on the Roberson Colourman’s archive at the Hamilton Kerr Institute in Cambridge, UK, where a computer database was used to generate a printed index of artist’s accounts in the archive (Woodcock 1995, Woodcock and Churchman 1997). Although this elegant solution makes access to this part of the archive straightforward, researchers must nevertheless present themselves in person and handle the delicate old documents each time they require information. Furthermore each individual accesses the archive for their own purposes, and in many cases their extensive findings are only available in summary form in later publications. The page-image database A new solution is to combine the advantages of a computer-based indexing system with digitized images from each page of a manuscript, removing the necessity of visiting the primary source in person, and reducing the need for repeated handling of rare and fragile material. Such a database has been designed for the 19th century recipe books of Winsor & Newton, the English manufacturer and dealer in artists’ materials. Pilot project: 19th century colourmen’s archives at Winsor & Newton For the first time in its 170 year history, Winsor & Newton is making their comprehensive archive available to outside researchers. It includes detailed recipes for making artists’ oil paint and water colours, consisting of 95 handwritten books of recipes and workshop notes, totalling 16,500 pages. The pilot database concentrates on the portion containing oil paint recipes, some 10,700 pages. Daily working notes and corrections are extremely valuable, showing how processes were revised to improve quality, reduce expense, or take account of variability in raw materials. Clearly Winsor & Newton were continuously concerned with controlling the quality of their products: the books include observations regarding the choice of raw materials with details of testing for durability and ageing. Being able to determine what materials were used to create a given colour or tube paint is essential for interpreting analytical data from actual 19th century paintings, a period when the colourmen were playing a greater role in the choice of paint ingredients. According to the Winsor & Newton archive, many tube paints contained mixtures of pigments and in some the binder was a mixture of oils (linseed and poppy). Knowing whether an artist mixed a given set of pigments to achieve a colour, or added certain materials to their paint, or whether this was already done by the paint supplier is significant in understanding the artists’ individual relationship to his material during this period. Example of use: Corot’s tube paints Searching the archive by materials is invaluable for paintings research. If an unexpected compound or element is noted during analysis of a paint sample, possible explanations for its presence may be found in the archive database. Undated tubes of paint in a paint-box that belonged to Corot (1796–1875) were analysed by Hermens et al. (2002). A partly legible label from Winsor & Newton read ‘Roman…’. The authors speculated that this might be Roman Ochre, but were puzzled because: ‘We found, however, both yellow ochre and a small amount of chrome yellow…’. The Winsor & Newton database was searched for ‘Roman’ plus ‘chrome’. This located 12 recipes including two, dated before 1854, for ‘Roman Ochre in Oil. (Artists’).’ Both consisted of Roman Ochre and Raw Sienna in poppy oil, with the addition of 3 per cent of Deep Chrome. This confirms the identification of the label as appropriate for Roman Ochre and accounts for the presence of chrome yellow. The attribution of the tube to Corot himself is possible as the recipe’s date (circa 1854) falls within his lifetime. The Winsor & Newton page-image database An image of a recipe in the original document, and the text-based summary fields for the recipe, are shown simultaneously in two windows. Thus researchers have access to the original hand-written recipes and notations at all times. In addition, the database allows readers to page through manuscript images as if reading the original book, so they are not confined to accessing pages only through the index alone. Summary fields allow searches for specific recipes, materials and methods, and chronological and alphabetical sorts. Recipes are also indexed by quantities and proportions of materials to enable variations or similarities in preparation to be identified quickly. A program was built to automatically convert from original units to SI units during data-entry so that recipes are recorded in both. Recipe summaries also include working notes, such as comments regarding time taken, purity or quality. Page-images ensure that indexing is faster than keying in full texts, especially when the texts are hard to read. Using images from the original document means that all diagrams, notes, and so on are also available, but importantly it allows readers to use their own expertise to read passages that might have been incomprehensible to the first compiler of the database. Visual clues mentioned earlier (that is, crossing-out, pasted in slips, passages in the same hand but different ink, and so on) which are absent from printed editions or electronic transcriptions, are fully evident in the digitized page images. Indexing limitations A major problem with compiling or using a comprehensive index or catalogue is that no indexer can predict all of the keywords that any possible future researcher may require. As research on the archive progresses, it is therefore essential that new categories (namely new fields in a database) can be introduced. The choice of a flexible software program (detailed below) to allow this to take place was essential. The database as a research tool Database evolution One of the essential features of this database design allows it to go beyond an indexing system: the database will allow researchers to analyse critically the information themselves and to initiate refined searches of their own design. For example, blank user-definable fields have been set aside to allow users their own custom data entry and retrieval. A researcher interested in temperature during manufacture can access all relevant recipes (for example for a given oil), then enter details of temperature and time into these blank fields. These fields are searchable and can be printed as a separate report. The intention is that this work could then be incorporated in a version of the database that other researchers will access and build upon according to their interests and expertise. Individuals’ data-extraction, like temperature and time, will remain for the use of subsequent researchers, perhaps in ways never anticipated by the original compilers or early users. This feature allows a continuing dynamic interaction with the primary data so that the database itself can evolve in the depth and complexity of the information it contains. To function as a research tool the database must also allow multiple refinements in data analysis. To achieve simplified summary reports there will be several intermediate steps required. The first step generates an overview of the material. For example, tables of contents or alphabetical indices of the whole archive, of individual manuscripts, or of found sets of records. This first step then reveals questions that may be asked of the data. Because researchers can use blank fields, they have the opportunity to explore new ways in which the data might be searched, sorted and presented, so as to answer such questions. This is a developing process: the results of each step in the analysis will suggest further productive ways to extract data. Further design considerations The most important feature of the page-image database is instant and easy access to the images of the original manuscript pages.1 In the case of the Winsor & Newton archive this is not always straightforward. Often a single recipe consists of coded numerical cross-references to several other steps which form essential parts of the original (for example, first manufacture two or more raw ingredients, then combine these to create a final product). To allow access to each step, links between page-images and recipe summary fields must therefore allow many-to-many links. The software has been programmed to unite the separate steps in a given recipe into a simple overview table, even when these steps are described in different books and spread throughout the archive. A problem specific to historic paint recipes is their terminology. Words for materials vary widely, and it is easy to misinterpret a term. It is important to index original words and spellings, not just the presumed interpretation, both to track the history of terminology, and to allow entries to be corrected based on future interpretations. Keyword indexing, therefore uses both the original language and interpreted terms, for example ‘copperas’ is also indexed as ‘iron sulphate’. To allow for variability in historic terminology multiple terms or spellings in the manuscripts are linked to a single interpreted term. Thus a search for the controlled term ‘iron sulphate’ will find multiple relevant entries such as ‘copperas, vitriol, vitreol, chalcanthum, calcanthum’. These lists and correspondences are managed by a further related database. The Winsor & Newton database incorporates a button to take researchers to the authority sited for a given interpretation, usually a published reference. To reflect new information or understanding, additions or alterations to the controlled vocabulary are possible. Software choice An advantage of using commercially available software is that subsequent users may add features and configure the database to suit their needs. The database was built in FileMaker™ Pro 6.0, by Filemaker Inc. This commercial software is well-established, well-supported, relatively inexpensive, widely used, multiplatform (Apple Macintosh and PC), and available in several languages. It can be programmed by project staff or subsequent researchers relatively easily. It is also possible to make ‘run-time’ versions of the database that allow users to use the database as a read-only tool without needing to buy a licence. Imaging considerations During data entry it was found that the speed of working on a recipe was directly related to the quality of the digital image. It stands that this will also be the case for researchers using the database. Colour imaging was found to be essential, both for distinguishing later additions to manuscript pages, and for improving the contrast between pale or faded inks and coloured paper. Because large image files slowed the use of the database, high-quality images were archived, and smaller file size (lower quality) working copies were generated using the batch processing facility of Adobe Photoshop® 7.0. As computer speed increases, the higherquality images can replace those currently in use. The choice of technique for digitizing an archive is commonly determined by available funding, for equipment and for operator time. An inexpensive A4 flatbed scanner using 300 dpi resolution or 24-bit RGB colour gave excellent results, but unfortunately scanning was slow, requiring about 2 min per page. Because depth of field is less than for a camera, scanning into the gutter of some books was a problem, and flatbed scanning is inappropriate for large or fragile books. Digital photography, which generated pageimages in a fraction of the time, provided less crisp but certainly adequate images. A Canon Powershot S60 was used, generating images with 2592 pixels × 1944 pixels, 24-bit RGB, and a mean file size of 1.8 MB, JPEG compressed. Canon digital cameras are currently unique in producing a live video image directly onto a laptop. Extremely useful for framing the image and for checking lighting, this feature greatly improved quality and productivity during image capture. Future developments: the Winsor & Newton database Ultimately it is intended that the Winsor & Newton archive database will function as a research tool for historical artists’ materials and techniques. Because Winsor & Newton is still very much an active company certain restrictions on access will be in place. It is anticipated that upon application to the company, serious scholars will be issued a password allowing them to access the database which will be resident at several established institutions, or through a secure web site. Two years of funding have been secured for image capture, database design and development, and the implementation of a full subject index of the material relevant to oil painting. This work will be completed by the end of 2005. Although a functional database will result, further funding will be required to allow data entry for all current fields in the database including entries for individual materials and amounts used in the recipes. Future developments: the database as a stand-alone tool This database structure is ideally suited to the publishing of manuscript sources, and rare published sources pertaining to art technology. This will be especially useful for mediaeval manuscripts, which are particularly time-consuming and difficult to edit. Their interpretation typically requires a collaborative interdisciplinary approach, for which this database structure is ideal. The initial researcher can index to the limit of his skill, time, or interest, then the electronic version can be passed around interested readers from various disciplines; each adding their own insights, translations and interpretations. Ultimately a series of such databases could be combined into a comprehensive wide ranging historical materials database. Page-image databases are not confined to art technological source research uses, but would be appropriate for ‘publishing’ all types of manuscript, archive or rare printed books and for searchable image databases. Conclusions The page-image database is time-saving and highly flexible. It combines the advantages of photographic reproduction with those of electronic text management. The ease of finding relevant manuscript material, and the search-ability that a database affords, is thus available without laborious transcription or editing. Diagrams and visual clues are present, and any interpretation can be checked against the original. Content can be enriched by successive researchers. Currently, when conservators and conservation scientists wish to consult technical sources to answer questions about a painting’s composition, the choice is either to refer to the same few sources that are already published, or to face the rather daunting task of sorting through a vast mass of unpublished material. This new database design will allow researchers to go directly to the relevant sections in a far wider body of primary sources. Acknowledgements We warmly thank Colart, the parent company of Winsor & Newton for all their support, encouragement and assistance, especially Richard Goodban, Ian K Garrett, Alun Foster, Emma Pearce and Sarah Miller. We are indebted to Bas van Velzen for building the draft version of the database, and to Maartje Witlox for advice on programming. We thank the Instituut Collectie Nederland for hosting this project and the Stichting Restauratie Atelier Limburg for administrative support. This two year project is funded by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (Netherlands Organisation for Scientific Research, NWO) and is included within the De Mayerne Programme: we are grateful for their support. Note 1 A powerful but simple system for linking and moving between such ‘parent’ recipes and their components has been devised. The key to this was establishing meaningful file-name and recipe-name conventions. Each book in the archive was assigned a two character code (for example R1), and each page was designated P and given a three- digit page number, plus A or B for recto and verso. Each page-image file was then given a filename based on these two codes, for example the jpeg image for page 23 of book R1 was named R1P023A.JPG. Each recipe was given a unique code constructed in a similar way: book code + recipe code + line number of first line (for example L01), so that a recipe code would appear as R1P023AL01. The advantages of naming a recipe by the number of the first line in the book where it appears are that (a) it will be a unique code, and (b) if a recipe is combined with another, there will be no duplication of numbers, nor any difficulty with interpolating a number if the recipe is subsequently subdivided. The biggest advantage of this system is that the database can use these codes to extract the necessary information to link recipe and image: for example the image for the first page of a recipe called R1P023AL01 is clearly R1P023A.JPG, and vice versa. The database can also automatically assign the relevant bibliographic data to images and to recipes based on these numbers, which is a great saving in time. These unique recipe codes can also be used by researchers as references when citing the texts. References Hermens, E, Kwakernaak, A, van der Berg, K-J and Geldorf, M, 2002, ‘A travel experience: the Corot Painting Box: Matthijs Maris and 19th century tube paints’, Art Matters 1, 104–121. Woodcock, S, 1995, ‘The Roberson Archive: content and significance’, in A Wallert, E Hermens and M Peek (eds), Historical Painting Techniques, Materials and Studio Practice, Getty Conservation Institute, Los Angeles, 30–37. Woodcock, S and Churchman, J (eds), 1997, Index of Account Holders in the Roberson Archive 1820–1939, Cambridge.