Digital Archiving of Astronomical Data to Support Publication and Long-term Preservation Assessment of Need One of the most fundamental aspects of scientific scholarly communication is the ability to cite and examine data in a persistent manner. Without this ability, the very essence of the scientific method, with its requirement of validating results, becomes compromised. Large-scale astronomy projects such as the Sloan Digital Sky Survey (http://www.sdss.org) have gathered data at unprecedented rates, raising new challenges and opportunities. This explosion in data-driven science has led to fundamental changes in practice and modes of inquiry, prompting the National Science Foundation (NSF) to advance the evaluation and development of Cyberinfrastructure to support large-scale, digital science projects. Both the Library of Congress' National Digital Information Infrastructure and Preservation Program (NDIIPP at http://www.digitalpreservation.gov) and the NSF Blue-Ribbon Panel on Cyberinfrastructure report (National Science Foundation 2003) stress the essential aspect of digital archiving of datasets to ensure long-term access. Most importantly, this year’s Institute of Museum and Library Services’ (IMLS) National Leadership Grant guidelines for demonstration projects invite efforts to “develop pilot projects or programs in data curation.” This proposal directly addresses this important and urgent priority. Without immediate action, we may find ourselves in a “digital dark age” losing important, scholarly resources from the scientific domain. The National Virtual Observatory (NVO) project is playing a leadership role in building services for the astronomy community to access and analyze astronomical data (http://us-vo.org). For good reason, the NVO is often cited as one of the quintessential cyberinfrastructure projects. With projects such as NVO, the astronomy community has moved into the forefront of data-intensive digital science, providing a path for other disciplines to consider. However, thus far the scope of the NVO has deliberately not included long-term data curation, focusing instead on data location and data access standards and protocols. Based on extensive, ongoing dialogue and communication, the NVO project team, led by researchers at Johns Hopkins University (JHU), has concluded that academic research libraries represent the ideal home for long-term preservation and curation of large-scale datasets to support persistent access and scholarly communication, given their expertise and longterm, sustainable support from universities. NVO researchers are not only involved in this proposal, but they are driving the effort. The proposed work does not rest upon assumptions or inferences related to digital science, but rather upon firsthand feedback from and interaction with NVO. The work proposed herein reflects a pressing, clearly identified need with serious implications for scientific research and scholarly communication. The Library's involvement in this effort does not arise from an abstract or theoretical argument. The NVO research team has concluded that the Library should move to the center of data curation for various reasons, including their confidence in the Digital Knowledge Center (DKC) of the Sheridan Libraries, which combines the rich, historical principles of library science with a leadership role for digital library research and development. National Impact and Intended Results In the astronomy community there is a long-established partnership between the dominant, non-profit publishers such as the American Astronomical Society and its production partner the University of Chicago Press (Astrophysical Journal, Astronomical Journal) and astronomy data centers and bibliographic services (Astrophysics Data System, ADS, in the US; Centre de Données astronomique de Strasbourg, CDS, in Europe). This proposal offers an opportunity to move libraries from the periphery of projects such as NVO to the center 1 of digital archiving and data curation efforts, and to establish a three-fold collaboration—publishers, an association of libraries, and NVO—that assures universal and long-term access and preservation. By incorporating NVO web services into a Fedora digital library framework (http://www.fedora.info/), we will provide mechanisms for long-term digital archiving of astronomical tables, catalogs, spectra, images and documents that facilitate data publishing for astronomers and scholarly journals. The proposed work will achieve the following goals: Recognize university libraries' key role in digital archiving and data curation, and move libraries into the center of digital archiving and data curation efforts Deliver functional system of an appliance to be delivered and installed at partner libraries Demonstrate the viability of a Fedora-based repository as the foundation for a data and digital content curation infrastructure Provide long-term, persistent storage and access for cited datasets Develop services to place processed data online Supply a catalyst/template for other disciplines and organizations Increase integrity of scientific publication Propose a new model for data publishing (with libraries as digital annexes for journals) With specific and substantive collaboration with the NVO and publishers, we will produce a human and technology infrastructure that will result in data curation of processed, digital science datasets to support publication and long-term preservation. While the proposed work focuses on astronomy, a discipline that is at the forefront of data-intensive scholarship, the results of this effort will provide a blueprint for other disciplines and a model for libraries to lead the efforts to curate data from large-scale, data-driven projects. Project Design and Evaluation Plan What does data curation mean for astronomers? Astronomers' data includes images, spectra, catalogs/tables, and documents or free form data. Individual or teams of astronomers using ground-based and space-based instruments capture or create a portion of these data that provide the foundation for research and publication. Most of these data reside in systems optimized for the storage and retrieval of data from particular telescopes or facilities, but lacking generic data access or query mechanisms (a situation the NVO is beginning to rectify). In any case, major astronomy archives focus on standard data products, and less so on highly processed images or spectra that are associated with peer-reviewed publications. Without an integrated system for individual scientists to deposit these data into a persistent library-based archive or NVO-standard interfaces for access, it is impossible to query these data, identify gaps in knowledge, cite the data within publications, or preserve them for long-term access. The barrier for participation in such a system should be low. Ideally, individual scientists should be able to check in their processed, high-level data into library archives as easily as they can create web pages. This proposal describes several contributions that will result in such a capability. We will develop a set of web services that link literature and reference materials to astronomical datasets. These services will reflect actual use cases as defined by the NVO. For example, "identify all literature and images within this portion of the sky." We will map these web services into a Fedora-based digital library framework. Deposited data will ultimately be archived within the Library, which will serve as a digital annex for publications. Fedora's ability to integrate web services and object discovery facilities are particularly relevant in this regard. Through this effort, we will develop an appliance that will comprise both the hardware and software to manage this service. This appliance will be installed at our partner libraries at the University of Edinburgh and the University of Washington. 2 Fedora (Payette et al. 2003) is an open source repository system being actively developed by the University of Virginia and Cornell University. Unlike some other repository applications, Fedora was designed to support the association of behaviors with the digital objects it contains. These associations are called "disseminations" in Fedora. So, instead of simply returning the content, as it was stored into the repository, Fedora can render the content or act on it in different ways. This coupling of digital objects with behaviors will allow richer interaction with the deposited content. The fact that Fedora is open source will give us the ability to modify the system, as necessary, and to distribute the appliance without concerns about software copyright. With the cooperation of the American Astronomical Society (AAS), the editorial staff of the Astrophysical Journal and Astronomical Journal, and the University of Chicago Press (UCP), we will develop an understanding with publishers to accept these data submission formats. Since the association of libraries will manage and preserve the deposited datasets, we will reduce the participation and entry barrier for publishers. Through these combined efforts, we will create a fully integrated network of processes, tools and systems to support ingestion and preservation of processed datasets to support publication. Integration with the Virtual Observatory A primary goal of the Virtual Observatory is to provide integrated access to archival data and derived data products: catalogs, tables, and highly processed images, spectra, and time series. The initial focus of NVO development has been on providing access to the former—the archival data sets that are already available via public interfaces on the web, but often with unique and incompatible interfaces. Derived data products are the purview of either dedicated large projects (such as the 2MASS or SDSS sky surveys) or of individual researchers. Large projects have thus far worked to provide access to these high level products. The valuable data from individuals or small collaborations sometimes appear in the electronic journals and sometimes on personal web sites, but most often these data are not available at all in any standard form or via any standard interface. We envision closing this gap in data access—to those data products that are most valuable for comparative, multi-wavelength studies—through a technological partnership among the research astronomers, the peerreviewed journals, the university libraries, and the Virtual Observatory (VO). As a result of providing a simple mechanism for researchers to upload, register, and annotate their processed data in a permanent digital archive, we enable the VO to integrate the collection of processed data products into the general framework for data discovery and access. The standard VO methods for data access (catalogs, spectra, images) can be implemented as front-end services to the Fedora-based, distributed collection of high-level data products. A somewhat larger challenge comes in the area of data discovery, which depends on the availability of reliable metadata describing the datasets in a collection. The other key to data discovery is the extraction of coordinate system metadata. Astronomers throughout the world use the FITS data format standard (Hanisch et al. 2001) and the associated conventions for celestial coordinate systems (Greisen and Calabretta 2002, A&A 395, 1061; Calabretta and Greisen 2002, A&A 395, 1077). FITS images and spectra that are uploaded to the repository can easily have the coordinate-related metadata extracted into an associated metadata database. It is then a straightforward matter to implement the coordinate-based Simple Image Access Protocol and Simple Spectrum Access Protocol developed by the NVO so that all images and spectra in the collection can be located and accessed transparently by the research community. Catalogs and tables constitute the other major type of derived data, and again the repository must provide a mechanism for gathering and associating the relevant metadata. The Virtual Observatory has developed both a standard data dictionary for tabular data, Uniform Content Descriptors (UCDs), and standard access methods (OpenSkyQuery and the Virtual Observatory Query Language). Catalog/table upload will include a process for associating UCDs with table columns, and the NVO can provide an OpenSkyQuery portal to the collection. 3 The DKC of the Sheridan Libraries, working with its library partners, will map the NVO tools, services, and metadata into a Fedora-based framework. The NVO and DKC will conduct this work as part of its ongoing and growing examination of data curation issues. Fedora-based Mapping The DKC represents a unique organization focused on digital library research and development. While the DKC is housed physically and administratively within the Libraries, its staff includes individuals with backgrounds in computer science, engineering, mathematics and cognitive science. This combination of perspectives has resulted in a comprehensive, diverse approach to digital library development. Program officers from NSF and IMLS have mentioned that the DKC is the only organization to receive grants from the NSF Digital Libraries Initiative, Phase 2 (DLI-2), Information Technology Research (ITR), and National Science Digital Library (NSDL) tracks, and IMLS' National Leadership Grant Program. DKC projects have focused on digital workflow management, especially the ingestion of and access to large digital collections. Through existing grants, the DKC has built both hardware (Suthakorn et al. 2003) and software (Droettboom et al. 2002), including tools for automated metadata generation (DiLauro et al. 2001). More recently, the DKC has focused on repository research, especially as it relates to repositories’ ability to support a range of services, especially digital preservation. Led by the DKC, JHU participated in the Archive Ingest Handling Test (AIHT), a program within the Library of Congress' NDIIP framework (DiLauro et al. 2006). The AIHT provided practical experience with ingestion of an archive that features multiple file formats, and varying levels of metadata. The AIHT test provided an excellent opportunity to test the capabilities of existing repository systems such as DSpace (Smith et al. 2003) and Fedora. JHU was the only institution that evaluated multiple systems. Through a grant from the Mellon Foundation, JHU has conducted a comprehensive and diverse technology analysis of repositories and services (https://wiki.library.jhu.edu/display/RepoAnalysis/ProjectRepository). In addition to these detailed explorations of repositories, the DKC is also involved in two UK-based repository projects funded by the Joint Information Systems Committee (JISC). Through one of these JISC projects (http://jiscstore.jot.com/WikiHome), the project team will further define and articulate the specific needs of astronomers for data curation, especially as it relates to electronic publications. JISC has coordinated its Digital Repository Programme training and development with JHU’s Mellon-funded repository analysis, and invited PI Choudhury to two recent meetings in the UK and the Netherlands. Finally, the DKC evaluated LOCKSS (Reich et al. 2001), and continues to manage a LOCKSS appliance. Through our extensive, rigorous and objective evaluation, we have concluded that Fedora represents the best system for this particular effort. As mentioned previously, Fedora disseminators enable the association of digital objects and behaviors or renderings. A Fedora dissemination is created by associating a method definition (behavior definition or BDef) and a corresponding method implementation (behavior mechanism or BMech) with a data object. In addition to providing a default dissemination based on content type (e.g., image, spectra, catalog), we will use these facilities to link deposited content with appropriate NVO services. These interfaces will allow users of a digital object to be drawn into a rich interactive experience that uses data from the selected object as the starting point for further discovery. For example, the image content (TIFF, JPG, etc.) of an image digital object might be displayed to the end user with links to a web service that allows interaction with a larger portion of the sky. Data from the object would be passed as parameters to the service, allowing the service to focus on the correct portion of the sky and, perhaps, to highlight appropriate objects. 4 To support search and discovery, technical and descriptive metadata will be extracted from deposited objects and mapped to one or more common metadata formats. While the details will be developed over the course of the project, it is anticipated that these formats will include simple Dublin Core (http://dublincore.org/) and a format designed to support information specific to the astronomy domain. These will be made available using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH at http://www.openarchives.org/OAI/openarchivesprotocol.html). Content will be harvested frequently so that new data quickly become available to the astronomy community. An Association of Libraries In the longstanding library tradition of collaboration with and distribution of preservation responsibilities, we will work with two library partners from the University of Edinburgh and the University of Washington. The Fedora-based appliance that we will develop offers a low-maintenance, low-resource method for participation. At the most basic level, we could develop this appliance and deliver it to our partners, who can manage it with minimal effort. In fact, it will not even be necessary to have local Fedora expertise. Such low barriers to entry and maintenance will bolster the prospects for other institutions to join our association. However, in addition to an ideal geographical distribution, our initial library partners each bring invaluable, unique, and worthwhile perspectives that lead to a more substantive association. JHU and the University of Edinburgh will build upon existing partnerships between both astronomy researchers and libraries. The JHU astronomy team has worked closely with astronomers at Edinburgh. Andrew Lawrence, the Head of the School of Physics at Edinburgh and Project Leader of the AstroGrid Project, has provided a letter of support that demonstrates the enthusiasm and interest from Edinburgh (see supplementary documentation). Lawrence's letter outlines the different expertise available at Edinburgh through the UK e-Science Center and the newly formed and JISC-funded Digital Curation Centre (DCC). From the DCC's website (http://www.dcc.ac.uk): The DCC's emphasis complements very well the Sheridan Libraries' emphasis on evaluation of digital repository systems. Additionally, the Sheridan Libraries and Edinburgh University Library have a formal partnership that has resulted in several, collaborative efforts including an existing NSF Information Technology Research (ITR) grant. Sheila Cannell, the Director of the Edinburgh University Library has provided a letter of support in this regard (see supplementary documentation). The University of Edinburgh enriches the international perspective to this effort, which is critical to consider digital archiving in its fullest manner. The University of Washington Libraries, part of the DSpace Federation, has a production system in place (https://digital.lib.washington.edu/dspace/index.jsp). Washington chose to participate as early adopters of DSpace with the goals of digital preservation, influencing scholarly communication and considering possible integration with the Digital Well, "a collaborative effort between ResearchChannel, UWTV, KEXP and UW Computing & Communications Advanced Systems Technologies Group to explore discovery, distribution and use technologies surrounding digital media collections on IP based networks" (http://digitalwell.org). With funding from the Mellon Foundation, the University of Washington has examined digital scholarship. This examination considered "creation of digital technology, tools and services to solve problems in scholarship" (http://www.lib.washington.edu/digitalscholar/index.html). By running a Fedora appliance at one of the DSpace federation libraries, we hope to persuade other libraries that running both DSpace and Fedora based services is possible and appropriate. This association of libraries with the NVO represents an excellent collaborative team; the connection to publishers provides the final component of our triad of partners. Collaboration with Publishers Cooperation from the key publishers in the astronomical community is essential for successful implementation of this proposal. To address the goal of transforming scientific scholarly communication using archived datasets, it is essential that the standards and protocols specifically developed in this proposal merge smoothly 5 with current usage at the major journals. Fortunately, this adoption is considerably simplified by the central role played by a handful of publishers in the professional community. In North America, the publications sponsored by the American Astronomical Society account for almost every significant professional venue. These journals have been in the forefront of the switch to electronic publishing. For example, the Astrophysical Journal (ApJ), published by the University of Chicago Press (UCP) for the AAS, now maintains an IT expert as a full time staff member in the editorial office, and the archival version of the journal is the on-line version rather than the paper version. This shift has opened the possibility of publishing extensive machine-readable data tables in the on-line version. It has also raised worries about developing and maintaining standards for data publication, as well as the connection between the articles, and the data on which they are based. There are three points that require close cooperation with the AAS and UCP from the very beginning: the development of keywords linking datasets with information about the particular instruments used to acquire the data, finding acceptable standards for linking scientific articles with particular datasets, and the labeling and storage of the datasets so as to make them a useful adjunct to the scientific literature. Our team includes individuals with deep ties to the publishing work of the AAS. Robert Hanisch who, in addition to acting as Project Manager for the highly distributed NVO, has served as chair of the Publication Board of the AAS, and Ethan Vishniac, who serves as Associate Editor-in-Chief for the Astrophysical Journal (and previously as Scientific Editor for seven years). The most relevant AAS employee is Greg Schwarz, working at the Tucson office of the Astrophysical Journal, who has been responsible for developing machinereadable formats for ApJ papers and is currently developing keywords for astronomical facilities. We will be in regular contact with him, and any other relevant AAS employees, to make sure that the work done here is consistent with the current usage, and future needs, of the AAS. Robert Milkey, Executive Director of the AAS, and Julie Steffen, Associate Journals Manager and Director, Astronomy Journals at UCP have provided letters of support that outline their respective organization’s endorsement of and commitment to work on this project (see supplementary documentation). Steffen will also play a key role in the business model development of this effort, as described in the Sustainability section. The work outlined in this proposal represents unique and groundbreaking effort. Currently, there is no data curation infrastructure within a library that directly supports scientific researchers. The project team members have an extensive network of contacts and professional commitments that provide ample evidence that we are not replicating existing work. One project that emulates our organizational partnership is CLOCKSS (http://www.lockss.org/clockss/), which “is a collaborative initiative by a group of organizations drawn from publishers, libraries and learned societies.” However, there are noteworthy differences with our proposed effort. From a technological perspective, this proposal outlines a repository-based content storage layer, which will support rich interaction with the content (beyond viewing of content only), and a data model and metadata. We believe these components are necessary for full-fledged digital preservation, especially as it supports citation within publications. CLOCKSS, as the name implies, is based on the LOCKSS technology (http://www.lockss.org). As mentioned previously, the DKC is familiar with LOCKSS. While it represents a valuable mechanism for creating distributed bit replication of content, it requires that content to be already stored somewhere. This assumption may be reasonable for electronic articles, but it is not true for the accompanying datasets that represent the focus of our effort. Additionally, LOCKSS does not include a metadata component, unlike our proposed effort. The emphasis on datasets also differentiates our proposed work from Portico (http://www.portico.org). PI Choudhury and Co-PI DiLauro have met with Portico officials, who have confirmed that they are not currently focusing on data curation. Even with the similar composition of organizational partners with CLOCKSS, this proposal differs in one major manner: the direct and substantive involvement from the researchers who are creating the large, digital scientific datasets (in addition to the relevant professional societies). These researchers, who are motivated by 6 an urgent and real need, will provide the expertise to ensure appropriate development of the Fedora-based appliance, and the feedback necessary to evaluate the outcomes of this proposed effort. Project Evaluation IMLS’ Outcome-Based Evaluation (OBE) stresses the question: “What changed as a result of our work?” This proposal addresses an urgent gap or need identified by the individuals (the astronomers) most affected by this gap. Not surprisingly, they are in the best position to evaluate the results of this proposed work. Their direct involvement in this proposal bolsters this prospect. We can measure the specific outputs of this proposal fairly easily. For example, the DKC learned a great deal about metrics or measurements related to large-scale, bulk ingestion of content as a result of AIHT. We will use these lessons to evaluate the effectiveness of the ingestion of astronomy data into the Fedora-based repository. The NVO team will assess whether the Fedora-based web services work properly to support the NVO web-services framework. Our partner libraries at Edinburgh and Washington can provide feedback and evaluation of the installation process and effort, and the relative ease of ongoing management of the Fedorabased appliance. The system that is developed must be cost-efficient, extensible, and financially sustainable. We will be analyzing the business model for digital data and content preservation, primarily with support from other organizations and in-kind contributions from collaborators. Having said this, this proposal offers the potential of far more significant outcomes. If successful, the proposed work could change the nature of scholarly communication for astronomers by providing persistent access to cited datasets. This work will move libraries to the center of data curation efforts, and establish a critical set of partnerships between these libraries, publishers, professional societies and the researchers themselves. These more significant outcomes are not as easily measured. Nonetheless, we will track the rate of data deposit into this system by astronomers who are publishing papers. We will also examine whether other cyberinfrastructure projects engage their institutional libraries in a similar manner to the NVO and JHU Libraries. Data collection and analysis and integrity in data presentation are the central nervous system of modern research. This project will help to indemnify the huge investments of public and private funds in scientific research by establishing a means to preserve and protect digital content and underlying digital data. The traditional unit of output is the journal article. The preservation issues surrounding this public record of results and findings are just now being addressed. How we effectively manage the vital data upon which journal articles depend has not yet been discussed. Project Resources: Budget, Personnel, and Management Plan The personnel for this one-year demonstration project comprise digital librarians, a metadata librarian, programmers, astronomers and publishers from JHU, NVO, AAS and UCP. JHU represents the lead organization given the central role of the library in building and maintaining this data curation infrastructure. Personnel Sayeed Choudhury, Associate Director for Library Digital Programs and Hodson Director of the Digital Knowledge Center at JHU, will act as Administrative Head. Choudhury has been the Principal Investigator for ten digital library projects. Most recently, he has been chosen as one of the technical auditors for the Center for Research Libraries/Research Libraries Group (CRL/RLG) repository certification and audit exercise. The 7 budget request includes cost-sharing of 10% FTE salary, fringe benefits, and indirect costs per year for Choudhury, an amount based on experience from his other grant-funded projects. Tim DiLauro, Digital Library Architect, Library Digital Programs at JHU, will act as technical lead, a role he has played for the digital library projects at JHU. Along with Choudhury, he has been chosen as the other technical auditor for the CRL/RLG repository certification and audit exercise, and he acts as JHU’s representative to the Library of Congress’ NDIIPP Preservation Partners planning meetings. The budget request includes cost-sharing of 10% FTE salary, fringe benefits, and indirect costs per year for DiLauro, an amount based on experience from his other grant-funded projects. David Reynolds, Metadata Librarian at JHU, will lead the metadata development effort. Reynolds has provided the metadata expertise for the digital library projects at JHU, including the AIHT project. The budget request includes cost-sharing of 10% FTE salary, fringe benefits, and indirect costs per year for Reynolds, an amount based on experience from his other grant-funded projects. Alex Szalay, Alumni Centennial Professor of Physics and Astronomer at JHU, is the Principal Investigator for the NVO. Szalay has extensive support from NSF for this work with both the Sloan Digital Sky Survey and the NVO, encompassing both research and educational outreach. His leadership and eagerness to approach the Library for data curation provide the inspiration for this proposal. The budget request does not include support for Szalay because his contributions are consistent with his NSF-supported activity. Ethan Vishniac, Professor of Physics and Astronomy and Director of the Center for Astrophysical Sciences at JHU, is the Associate Editor-in-Chief for the Astrophysical Journal. Vishniac has received NSF funding for work related to this proposal. He will act as the liaison with the publishers. The budget request does not request support for Vishniac because his contributions are consistent with editorial role with the Astrophysical Journal. Ann Lally, Head of Digital Library Initiatives, and Jennifer Ward, Head of Web Services, at the University of Washington, will work with DiLauro toward the installation of the Fedora-based appliance at Washington. Lally, who previously worked with well-known digital library researcher, Hsinchun Chen, at the University of Arizona, played a key role in the examination of digital scholarship. The budget request includes IMLS funding for 2% FTE salary, fringe benefits and associated indirect costs for both Lally and Ward. John MacColl, Sub-Librarian, Digital Library Division, at the University of Edinburgh, will work with DiLauro toward the installation of the Fedora-based appliance at Edinburgh. MacColl leads the JISC-funded STORE project that will identify scholar needs for data repositories. Additionally, MacColl has recently co-authored a book on the institutional repository. MacColl will participate in this project without funding from IMLS, relying upon JISC support instead. The budget request also includes IMLS funding for portions of three programmers, one from the Libraries at JHU, and two from Physics and Astronomy at JHU. These programmers, with specific, relevant experience and expertise from with AIHT and NVO will focus primarily on the programming for the Fedora-based appliance and for the NVO data ingestion. As mentioned previously, Robert Milkey’s (from AAS) and Julie Steffen’s (from UCP) letters of support confirm their commitment to this project (without funding from this proposal). In addition to the personnel outlined in this proposal, there are a few individuals who will work on related activity with funding from existing sources or newly identified funding. Robert Hanisch, Space Telescope Science Institute and Project Manager of NVO, Michael Kurtz from the Harvard-Smithsonian Astrophysical Observatory, and Ray Plante, National Center for Supercomputing Applications (NCSA) will work on astronomy-specific technical matters. Terry Ehling, Director of Innovative Publishing at Cornell University Library, oversees the development of DPubS, an open-source, electronic publishing system (http://dpubs.org/), 8 will consider the connections to electronic publishing systems that will support data deposit procedures at the point of article creation. Budget The total budget request for this proposal is $278,601 with $201,471 requested from IMLS and $77,130 offered as (28%) cost sharing in the form of salary, fringe benefits and associated indirect costs for the JHU Library staff and half of the equipment costs, which comprise three servers and associated hard disks for the astronomy datasets to be installed at JHU, Edinburgh and Washington. All senior personnel from JHU Library offer their contributions as cost sharing, a decision that reflects the commitment by JHU Library to embrace data curation as a core activity. It should also be noted that this project team has approached both the Scholarly Publishing and Academic Resources Coalition (SPARC at http://www.arl.org/sparc) and Microsoft for complementary funding. Specifically, potential SPARC funding would support business model and sustainability efforts and Microsoft funding would support astronomy-specific technical work. We are cautiously optimistic about both sources of funding (which would support complementary activities, not the ones outlined in this proposal), especially given feedback from both organizations. Rather than assume that IMLS should fund their entire effort, this proposal focuses on the library-specific, core aspects of the data curation efforts, which represent the most appropriate aspects for an NLG proposal. The travel request of $15,000 may be higher than other NLG proposals. This amount reflects the involvement of an international partner (Edinburgh), and the plans for dissemination at several conferences, including a relevant, international conference. The budget descriptions include a spreadsheet outlining potential travel costs. Management Plan The project team already holds regular teleconference calls, communicates via email, and convenes in-person meetings (both at JHU and UCP). This proposal represents a longstanding dialogue and process, and a shared understanding of needs, goals, objectives and delineation of responsibilities. If funded, this effort will benefit from this prior collaboration and communication. Given its existing collaborative grant-funded projects, the Library Digital Programs at JHU has developed an extensive web-based project management system that includes a Confluence-based wiki for collaboration (http://wiki.library.jhu.edu), and dotproject (http://www.dotproject.net), an open-source project management tool. Both of these tools support document sharing, collaborative development of tasks, email notification and reminders, event and time tracking, automatic Gantt chart generation, and to-do lists. Combined with JHU’s continuing development of web-based access to administrative information, this project will possess a rich technology infrastructure to augment the already established human connections. These tools are intended for internal project management and communication, but they are complemented by a set of external web-based resources described in the dissemination section. Dissemination The Library Digital Programs (LDP) at JHU has recently developed a web portal available at http://ldp.library.jhu.edu, built using open-source technology. For existing JHU projects and items of interest, we have developed RSS feeds that can be automatically read through RSS readers or the Firefox browser. Additionally, LDP staff has used blogs to disseminate information during conferences, provide project updates, share ideas, and web-based forums to solicit, document, and respond to feedback from the broader community. 9 This DSS project will have an individual presence within the overall LDP web portal. The associated forum will complement the wiki and dotproject communication and collaboration tools described earlier. The members of the project team have an extensive track record of presentations, publications and involvement in panels or forums related to digital libraries. It is worth mentioning that Alex Szalay has already presented at the 2003 Web-Wise Conference regarding the importance of data curation and the possible role for libraries in this regard. The results from this proposed effort represent an excellent follow-up to Szalay’s presentation. JHU and the University of Washington are members of both the Coalition for Networked Information and the Digital Library Federation. Choudhury, DiLauro and Lally have presented at both meetings, and will provide presentations related to the results of this work. Choudhury attended the Ensuring Long-term Preservation and Adding Value to Scientific and Technical data (PV 2005 at http://www.ukoln.ac.uk/events/pv-2005/), a conference focused on international e-science issues and projects. One of the organizers of the conference asked Choudhury to consider a presentation proposal for a future PV Conference related to the JHU Library’s work with NVO data curation. Choudhury, DiLauro, and Reynolds have published in several forums including D-Lib Magazine. In addition to the digital library related venues and opportunities, we can rely upon the astronomy-related prospects as well. As an example, the NVO team has published extensively regarding all aspects of their project, including educational and outreach efforts. A list of these publications is available at http://www.usvo.org/pubs/index.cfm. Sustainability There are multiple aspects of sustainability associated with this proposal. First, one of the main reasons that the NVO wishes to work with the Library is that libraries represent a sustainable organizational entity, as compared to a project-based entity such as NVO. It should be noted that the fiscal year 2007 budget request from JHU Library includes a request for a digital preservation specialist who would focus on data curation activities across multiple disciplines. This proposed work would provide the foundation from which to build the expertise, knowledge and infrastructure for this individual to continue and sustain. Second, the core technology for the repository-based appliance is Fedora, which has built a diverse user and developer community. By building upon a technology platform that has widespread adoption and interest, we bolster our possibilities for ongoing support and development. Perhaps most importantly, the development of this prototype appliance will provide insights into the costs for development, installation, and ongoing maintenance of such (both human and technology infrastructure). UCP provides appropriate expertise and experience in the development of business and financial models to analyze and build upon these cost-related findings. UCP already provides the financial and business home for astronomical journals, so it understands the domain well. This familiarity and understanding provides the final piece to ensure that the important results and findings from this proposal will continue into the future—ensuring that scientists can look to libraries for leadership in data curation. 10