Erin M Sladen Digital Libraries - e553 Tefko Saracevic Final Term Project Database Proposal and Mockup Instructional Cooking Database mock-up website: http://eden.rutgers.edu/~ems295/553/term/home.html Abstract The goal of this paper is to present a proposal for a database of instructional cooking materials. This topic was investigated because making, preparing, and eating food is a universal need for people of all ages, genders, occupations, and ethnic and socioeconomic backgrounds. My hopes are that a database such as the one proposed here would allow users to grow their understanding of food preparation, nutrition, and kitchen tool usage by bringing various instructional cooking methods, from the internet, video, and print, together into one searchable database. Because this proposed database would be a large undertaking that suggests gathering many different and varied sources together, legal and licensing fees may be high and access restricted to paying members or institutions. This paper walks through the steps of setting up and achieving this database, including a mockup of the proposed database's site navigation, design and content, metadata to include, digitization standards to initiate, and how to provide access to the database. Keywords: cooking, instructional, access, permissions, licensing, evaluation. 2 Table of Contents Abstract 2 1. Purpose 4 a. Mission Statement 4 2. Objectives 4 3. Content 5 a. Categories 6 b. Images 8 c. Videos 8 d. Courses 8 4. Design and Mockup 9 5. Metadata 10 6. Content Management and Searching 12 7. Legal 12 8. Digitization 13 a. Standards 13 b. Costs 15 9. Preservation 15 10. Access 16 11. Evaluation 17 12. Conclusion 19 13. Works Cited 20 3 1. Purpose The purpose of this database, Instructional Cooking Database or ICD, is simple: To provide, in one centralized place, a library of instructional cooking media. Cooking is something that nearly everyone does on a regular basis. ICD will bring together many of the varied resources available in different medias so that users can search and access materials on any topic of their choice. This database is not meant to be a place for recipes, although it may contain them; rather, the hope is that this database will be a learning and instructional tool for users to consult about cooking methods, gain experience in various areas including nutrition, watch videos from specific instructors, and generate ideas, enabling the ordinary cook to learn and grow. a. Mission Statement The purpose of ICD will be to increase access to diverse cooking instructional materials to help grow the community's relationship with food. 2. Objectives The objectives for this proposal are tri-fold: a. To plan the basics of a digital library database, including legal, access, searching, and other considerations, b. To create a mock-up webpage design of the site, and c. To intimately learn and discover all of the steps that go into creating and sustaining a digital library, with hopes of one day making this database a reality. 4 3. Content figure 3.1 Content will be divided in many ways to make the home page easy to browse. Figure 3.1 shows a screenshot of the home page. A simple logo is at the top of the page indicating that this is the Instructional Cooking Database, or ICD, directly followed by a straightforward navigation system. Aside from the home page, there will be four other pages directing the user towards content: a. Categories, b. Images, c. Videos, and d. Courses. A "you-are-here" indicator tells the user which page they are on with a bolded heading and turquoise text; in Figure 3.1, the user is on the "Home" page. These four pages, described in further detail below, will have options to browse to find information. This is similar to and slightly modeled off the "Browse the Collections" section of the Perseus Digital Library. Within the ICD, users can browse the collections by selecting a topic from the main navigation menu, then selecting further sub-topics until they find one of interest. All of what is described here is also embellished on the mock-up web page for this digital library database proposal (the web page is viewable here: http://eden.rutgers.edu/~ems295/553/term/home.html). The web page does not, however, have the browsing capabilities I am proposing for this website. It is meant purely as a detailed illustration of my design intentions, alongside this written digital library proposal. If the user does not wish to browse but would prefer to search, a searching option will also be available to them. Please see Section 5: Metadata for more information. 5 a. Categories figure 3.2 The first topic, Categories, is divided into multiple sub-topics with a drop down menu, as shown in figure 3.2. Clicking on each sub-topic from the drop down menu will take you to a description of the sub-topic, which are described in more detail both below and on the categories page. Within each sub-topic will be clickable links that will take the user to digital collections. These collections will be both browseable and searchable, and each entry will contain a difficulty rating. For example, the sub-topic "Health Concerns" will lead the user to multiple categories, including "Gluten Free." By clicking on the link to "Gluten Free" the user will be able to access all related content within the database that is tagged with this category. For more information on the tagging system, please see Section 5: Metadata. This is a proposed concept that is not available on the mock-up. 6 This page and the sub-topics on it should be well-maintained throughout the life of the database by updating lists, creating new sub-topics, and removing outdated information. As such, the current lists of topics and sub-topics are not yet complete, and will never truly be complete. For more on maintenance please see Section 9: Preservation. Some examples of the sub-topics that will be available to browse are: i. Type This category will consist of the different types of formats availale to view information with, such as video or images. ii. Topic A breakdown of categories for the many different topics that are covered within this database. Included are holiday cooking, how to use tools (i.e. how to sharpen a knife or season a cast iron pan), or instructions for different meals, and more. iii. Instructor Various different instructors whose work is available for viewing within the database. iv. Affiliation The "Affiliation" category groups information by who its original publisher or presenter was. Some examples may be videos originally shown on The Food Network or PBS, information originally published in the Cooks Illustrated magazine, or information coming directly from blogs or online websites such as YouTube. v. Ingredient A breakdown of topics relating to various ingredients. As has previously been stated, this database is not meant to be a place for recipes. While this category will house many ingredient-related recipes, it will also be a place for explaining things like how to cut apart a whole chicken or how to create stock from scratch. 7 vi. Health Concerns Information related to various health concerns, such as a gluten-free diet, food allergies, or vegetarian needs. b. Images and Documents The content linked to in this section will provide both browseable and searchable static images or sets of images along side instructional directions, such as illustrations or photographs, that display cooking instructions. This information may come from a variety of sources, including cook books, blogs, or pamphlets. These images will be arranged by categories similar to those listed above in the Categories section. Each image will be its own entry into the database and be searchable. c. Videos The content linked to in this section will provide instructional cooking videos on various topics. Just like the Images, these videos will be searchable by some of the same topics listed in the Categories section. Two different forms of videos will be available from this page: i. Videos that are uploaded to the database and watched while within the database. ii. Videos that are linked to on the database but exist externally. These videos will either be created specifically for this database or from other sources and uploaded and displayed through a licensing agreement. For more on this, see Section 7: Legal. Each individual video will have it's own entry in the database and be searchable. d. Courses In addition to corralling instructional cooking media that exists around the web, I propose creating some content specifically for this database. This model is based off Lynda Campus, an organization which creates video demos and courses to explain technological and software-related concepts. I imagine this model being adapted to fit in with my digital library of cooking techniques by either using 8 licensed, already created videos or by creating videos specifically for ICD. These courses would include multiple videos on how to do specific cooking and food related things, including in depth courses on learning the basics of a type of international cuisine to quick tips and 5 minute lessons. 4. Design and Mock-up View the Mock-up here: http://eden.rutgers.edu/~ems295/553/term/home.html A mock-up was created for the purposes of allowing the reader to better visualize my intentions and conceptualizations regarding the proposed ICD digital library. The mock-up was created using HTML and CSS and has a workable navigation structure with multiple pages to guide the user around the site. Visiting it will give the reader an accurate depiction of my initial design plans. figure 4.1 A logo, figure 4.1, was created for the database. In addition, this logo will also guide the color scheme for the web page: a white background with black text and embellishments in turquoise and orange. A simple navigation system guides the user from one topic to the next, including links to jump directly to sub-topics. There is still a lot that needs to be added and embellished on the mock-up in order for this digital library to become a reality. There is not currently any content available for viewing on the web page. For more information on adding content, please see Section 6: Content Management and Searching. The mock-up was necessary to best present the design and structure of the proposed digital library. 9 However, the website is functional and easy to comprehend and navigate. The finalized digital library will also follow suit. 5. Metadata Metadata, the process of tagging information so that it is not only easily findable later but a method for ensuring that the data concerning a piece of information, such as title or creator, is not lost. For this database I have chosen to use the Dublin Core metadata standards, which consist of 15 specific element attributes: i. Title ii. Creator iii. Subject iv. Description v. Publisher vi. Contributors vii. Date viii. Type ix. Format x. Identifier xi. Source xii. Language xiii. Relation xiv. Coverage xv. Rights Management Dublin Core was chosen because of its simplicity and straightforward manner. For each piece of information entered into the database, as many of these 15 metadata elements as possible will be used to describe the entries included in ICD. In many instances not all 15 elements will be available for metadata inclusion. For example, figure 5.1 shows an example of metadata added for a specific video, the episode "The Good Loaf" from Julia Child's television show The French Chef. This video is one example of many types of instructional media that will exist in ICD. I found the episode through a DVD of the show, though it has at times been available on the PBS website, and is currently streaming on YouTube. The metadata for this item is listed as fully as possible, though there is some overlap, especially 10 between Identifier and Source. Because this is a 40 year old television show, lots of gaps exist and metadata is applied to the best of the creator's ability. figure 5.1 Adding metadata is not a complex process, but it is a long and laborious process that must usually be done by a human. Collecting content and creating metadata for that content will be one of the biggest, most expensive and timeconsuming part of making this digital library become fully operational. It is also a very critical part of the process in creating the database because it ensures that the information added to ICD will not only exist safely but be findable by its users. 11 6. Content Management and Searching Content that is placed in this database will need to be stored and managed, which can be accomplished through a PHP server and SQL program, such as MySQL. For the purposes of this proposal, this database storage was not produced on the mock-up site. Each entry into the database will have multiple fields describing it. In most cases these fields will be similar or identical to the metadata standards described in Section 5: Metadata. By using the metadata standards, we will be creating a double use of metadata: they will contain information about an item and they will allow a user to find that item. When searching, the site's search engine can look through the metadata tags to find relevant information related to the query. While there is no cost to use MySQL, it is a time consuming process and the expense of time must be taken into account. 7. Legal Because of the wide scope of this proposal, obtaining licensing agreements from the owners of the videos, images, and content I wish to use in this database will be one of the biggest challenges for this database. Many companies with a large volume of instructional cooking materials have heavy copyright restrictions or fees to view, use, and replicate the information. In order to get a feel for the various issues surrounding the legal and copyright aspects of this proposal, I have deeply investigated The Food Network's terms of use. The terms of use state that while the website may be accessed and used by users at no cost, the content represented is the intellectual property of the Food Network and cannot be reproduced or copied anywhere. However, the terms of use do state that linking to the website is allowed. The terms do not specify to what type of linkages are permitted however – the excessive links planned for this database will most likely not be allowed. A licensing agreement between the Food Network Corporation and my ICD digital library would most likely need to be reached. I 12 attempted to contact the Food Network about the possibilities of licensing, but have not received a response. I will assume that for many current publications, licensing agreements will need to be in place before I can use their materials. Because of this, it may be the most beneficial for the ICD digital library to focus first and foremost on public domain related items – cook books and magazines, pamphlets, utensil and cookware instructions, and personal narrative accounts. If ICD begins by digitizing these objects and giving proper citations to the author, it can build its database and begin to be functional before requesting licensing agreements with large corporations. Another option is for ICD to pull related information from the Rutgers Libraries databases. Rutgers already has licensing agreements put into place with many information providers and would mean that ICD would not have to worry about these licensing. In order to for users to access the content from the ICD database, they would have to be members of the Rutgers community and enter their credentials before viewing material. 8. Digitization Digitizing materials is an important part of every digital library or database. It entails taking an object from its original form and formulating it to work in the required formats of the intended digital library. A lot of the content I intend to use has either already been digitized or was born-digital. Nevertheless, a breakdown of digitization standards and costs is still appropriate. The highest possible standards should be used when digitizing items. Conversely, costs should be appropriate to the scope of the digitization. a. Standards The standards that I propose using will vary based on what type of object is being digitized. Generally, for any static item – text, photographs, illustrations and artwork, maps, and more – there will be a high resolution and a minimum of 8-bit for greyscale objects and 24-bit for color objects. 13 The University of Colorado's Digital Library standards are a perfect guideline for portraying these minimum requirements. Items that may be digitized in such a way include old public domain or licensed cookbooks, magazine articles, or pamphlets. figure 8.1 (University of Colorado Digital Libraries, 2009, p. 4) As Figure 8.1 shows, in addition to resolution and bit depth standards, TIF is the suggested file format for these objects. These standards will ensure that all items that are converted will be readable and/or viewable on the typical computer screen monitor. Similarly, the University of Colorado's standard guidelines for digitizing audio objects suggests, as shown in Figure 8.2, a minimum sample rate of 44.1kHz and 16-bit depth. For audio files, WAV or AIF file formats should be used. figure 8.2 – Minimal Requirements for Digitizing Audio (University of Colorado Digital Libraries, 2009, p. 7) Lastly, for standards regarding video I suggest following the New York University Library standards (De Stefano, et. al, 2013, p. 6-8): 14 i. Different File Types 1. A long term "preservation file" which will be the master file and will not be touched or altered. MOV file extension uncompressed, 10-bit 4:2:2 video stream 48kHz audio stream 2. A "mezzanine file" to serve as a surrogate for the master file upon which changes can be made. MOV file extension DV50 video stream 48kHz audio stream 3. And an "access file" to serve as the general use copy for users to view. WMV file extension Window Media at 700 kbps video stream 44.1kHz audio stream b. Costs Quality scanners and equipment would be required to convert static images into usable digital formats, are one example of the cost of digitizing for this collection. For a digital camera, flatbed scanner, slide and film scanner, and document scanner, the ICD digital library could potentially pay $3000 or more for equipment alone. Additionally, there would be costs to employ workers to actively digitize items, create metadata, and upload to the proper portions of the database. The cost of digitization may be high due to these reasons, in addition to licensing fees, but this is an essential part of database creation that cannot be overlooked. 9. Preservation Preservation of the resources in this database will be critical to the long term survival and use of the database. Preservation will include site management and updates to be sure that the pages within the site remain relevant in terms of both content and HTML/CSS and browser requirements. 15 Preservation will also include backing up the data stored in the digital library. This will be done physically, by backing up data in multiple places, such as on several different hard drives. Preservation will also be accomplished by saving that data in multiple formats to ensure that obsolescence does not occur. Like metadata and content management, preservation is a costly and time consuming process, but it should not be overlooked and is necessary for the longterm survival of this and any digital library. 10. Access The ICD digital library would be accessible by a user fee model that will be similar to other databases. However, because of the nature of this database to hopefully be used by the general public, I believe it is important to consider methods that would make this fee as low and reasonable as possible. One option would be have restricted access to the database. Some content that is pulled from public domain or merely cataloged in the database then linked to in it's original source, as is discussed in Section 7: Legal, should be available without costs to anyone. Other content that is under licensing agreement will a one-time fee per item or an umbrella yearly membership fee, which will depend on the user – institution or individual. For institutions that pay a yearly fee, the database will be accessible through a link on their web page, similar to how many database are accessible via the Rutgers Library website, requiring only for the user to sign in with his or her credentials. For users who are accessing the database without an institution, the mock-up web page will be turned into a landing page for users to sign in to their membership and access the content that way. 16 11. Evaluation Evaluation of a resource such as a digital library is an important part of its development and maintenance. In order to be sure that the ICD digital library is up to standards, presents its information in an appropriate way, and uses its funds in a way that correlates to what its users require from it, I have chosen six evaluation criteria: i. Content: The content in ICD should be a good representation of the available literature and information. It should be organized in a way that is obvious and easy to understand to the user. It should be presented in an honest, straightforward fashion. My plans to organize and present the information, as detailed above in Section 3: Content, work together with my plans for metadata and searching, see Section 5, to create easily findable and usable content. Other things that should be evaluated are relevancy and accuracy for each item and for the database as a whole. ii. Technology Do the hardware and software work for the purposes of this library? One example is the SQL/PHP setup for adding content to the digital library. This may be a successful means of adding content, but proper evaluation will ensure that its performance remains up to par. Technology evaluation will also focus on costs of effectiveness of costs used and how easily the technology is accessed on both the user's and creator's sides. iii. Interface The interface of the digital library should be usable, meaning it should support user interaction. Users should be able to easily use a site. Accessibility should be high, error rate should be low, and the sites organization should support the interface. The interface should be visually appealing and consistent throughout the library. iv. Process and Service 17 A wide range of services should be offered to the user through the site: straightforward navigation, searching enabled, low error rate, easy browsing, and straightforward means of obtaining a desired resource within the database. v. User The use of the ICD digital library should positively affect users. Evaluation of this criteria should ensure that users are receiving information in the areas they are searching that actively affects their lives. The content and information received should positively affect their cooking and nutritional tasks going forward. vi. Context The idea of an instructional cooking digital library makes sense in the current world as presented: cooking is something that most of us do on a daily basis. As long as this is true, the database should focus on bringing together all of the relevant cooking methods, information about food and tool usage, and more that fit into the context. Adding and removing appropriate content will be important to retaining this context. While the plans for evaluation listed here are straightforward, they are not set in stone and should be managed and updated frequently, as the need presents itself. It may also be necessary to evaluate costs of salaries, equipment, and other fees. The key to successful evaluation is consistent evaluation. Some of my proposed means of evaluating this digital library in the future are: a. User surveys b. Statistical reports of downloads and download times c. Reports of searching: when did it lead to a download? when was it repeated multiple times to no preferred response? how often was it abandoned? d. Personal observations and reactions e. Focus Groups f. User Interviews 18 12. Conclusion I proposed this project because cooking is something that I do daily. I find a lot of joy in cooking, but I am also frustrated when I want to know how to do something but cannot easily learn how. A centralized place of knowledge for all cooking instructions would be exceptionally helpful to me and, I believe, to many other people. The steps outlined here are vast, laborious, and complicated. I have done the first big step by outlining my ideas for this database, creating the HTML mock-up to showcase my proposed design, describing various content, and explaining strategies for the database, such as metadata, licensing, and preservation, among others. Making the database as described here a reality would clearly be an incredible undertaking and, at this moment, is well beyond the scope of what I am capable of presently. However, I do believe that I would be able to begin to materialize a functional database by taking small steps and not attempting to fulfill all the sections listed here at once. If I were to go forward, my first step would be to create a PHP and SQL responsive website. From there, I would introduce content individually, starting solely with links to Rutgers' licensed materials. This way, anyone who is a Rutgers member would be able to access the materials. This strategy would mean I wouldn't be stuck immediately trying to figure out handling access or licensing issues which, because of the various high profile media available and desirable for use, would be a monumental and exceptionally expensive task that I could not take on by myself. This is something I sincerely hope to work on in the future going forward, beyond the scope of this class. I have already learned so much just by setting up a proposal and thinking deeply about how I would carry out my plans. I know that I could learn a lot more by moving the proposal forward into the next phase. 19 13. Works Cited Biomedical Computation Review (BCR). (2008). BCR's CDP digital imaging best practices, Version 2.0. Retrieved from http://mwdl.org/docs/digitalimaging-bp_2.0.pdf Bridges. (2007). 15 Dublin core element attributes. Minnesota Metadata Guidelines – Dublin Core. Retrieved from http://mn.gov/bridges/dcore.html Crane, G. R., ed. Browse the collections. Perseus Digital Library. Retrieved from http://www.perseus.tufts.edu/hopper/collections De Stefano, P., et. al. (2013). Digitizing video for long-term preservation: An RFP guide and template. New York University Libraries. Retrieved from http://library.nyu.edu/preservation/VARRFP.pdf The Food Network. (2014). Terms of use. Retrieved from http://www.scrippsnetworksinteractive.com/terms-of-use/ Gilliland, A. J. (2008). Setting the stage. Introduction to Metadata: Pathways to digital information. Online edition. Version 3.0. Los Angeles, CA: The J. Paul Getty Trust. Retrieved from http://www.getty.edu/research/publications/electronic_publications/intro metadata/setting.html Hillmann, D. (2001). Generic examples. Metadata Dublin Core Usage Guide. Retrieved from http://dublincore.org/documents/2001/04/12/usageguide/generic.shtml Ickes, M., & Gambescia, S. (2011). Abstract art: how to write competitive conference and journal abstracts. Health Promotion Practice, 12(4), 493-496. International Federation of Library Associations and Institutions (IFLA). (2002). Guidelines for digitization projects for collections and holdings in the public domain, particularly those held by libraries and archives. Retrieved from http://www.ifla.org/VII/s19/pubs/digit-guide.pdf Lynda.com. Software training and tutorials. Lynda Campus. Retrieved from http://www.lynda.com 20 Metatags.org. (2014). How to use Dublin core metadata set. Dublin Core Metadata Initiative. Retrieved from http://www.metatags.org/dublin_core_metadata_element_set PBS. (2013). Site terms of use. Retrieved from http://www.pbs.org/about/policies/terms-of-use/ Quam, E. (2002). Minnesota metadata guidelines for Dublin core metadata: Training manual. St. Paul, MN: Minnesota Department of Natural Resources. Retrieved from http://mn.gov/bridges/bestprac/training.pdf Reese, W. (2014). Digital libraries term project: The North American Saxophone Alliance digital library proposal. Saracevic, T. (2014). PowerPoint lecture: Evaluation in digital libraries. Retrieved from http://comminfo.rutgers.edu/%7Etefko/Courses/e553/Lectures/Lecture09 _Evaluation1.ppt Scott, A. (2008). Planning for successful digital imaging projects. Thinking Outside the Borders (151-156). Urbana-Champaign, IL: Mortenson Center for International Library Programs at the University of Illinois. Retrieved from http://www.library.illinois.edu/mortenson/book/20_digitalimaging.pdf University of Colorado Digital Library. (2009). Digitization best practices. Retrieved from https://www.cu.edu/digitallibrary/cudldigitizationbp.pdf w3schools.com. RDF Dublin core metadata initiative. Retrieved from http://www.w3schools.com/webservices/ws_rdf_dublin.asp WGBH Boston. (1972). The French Chef. 21