FAQ and URL Lists Related to: “Developing a Computational Environment for Coupling MOR Data, Maps, and Models: The Virtual Research Vessel (VRV) Prototype" NSF MG&G Data Management Workshop – May 14, 2001 Excerpt from NSF ITR Proposal that funded VRV: 4.0 FAQS ABOUT ARCHIVAL DATABASES Some of the often-raised issues and questions regarding an archival database and the supporting computational tools are: (i) how will the RIDGE database be maintained after the end of the granting period? (ii) how will the data be curated by the community in both the short and long term? (iii) how will new technologies be integrated into the system?, and (iv) how will duplicative work will be avoided…. 1. How will the RIDGE database be maintained in the long term? Our first choice for housing the VRV archive is at Oregon State University (OSU). This makes good sense because one of the PI’s is resident at OSU, and because it is located geographically “between” the four collabortors. This will allow easy collaboration after the current project. In addition, OSU is currently home to the RIDGE office, which will provide early logistic support. OSU is also home to one of the premier data centers of the NSF Long Term Ecological Research (LTER) sites. This center has 20 years experience with archiving and making available research data across disciplines. One of the PI’s (Cushing) has an ongoing collaboration with the data center. The full team of researchers is committed to providing a smooth transition from the prototype database to operational system, and we will work with the community to identify two or three other laboratories where the operational RIDGE database could reside or which could provide additional help in deployment of the RIDGE database. These include: the San Diego Supercomputing Center Database lab (NPACI), the USGS EROS Data Center in South Dakota, and a digital library such as that at University of Arizona or UCSB. The current PI’s commit to helping the archival laboratory write the initial follow-on proposal to maintain the database and respond to users. Support for the archived database could come from NSF grants; the NSF-supported “core” labs provide an example of how this might work. 2. Where will the database reside and how will it be distributed to users? The database will be web accessible, but we will also prepare a CD ROM with the data and software. If under rare circumstances there is no support for maintaining the database web site or in the unlikely scenario that the mid-ocean ridge ceases to be an active area of research, the CD ROM can be used for some years to disseminate the data. Eventually, a digital library archive for scientific data must be established; the PI’s agree to find a “permanent home” for the database. 3. How will project tools be maintained? This project proposes three kinds of tools: GIS scripts for browsing data, database tools for loading and validating data and a computational environment. Standard off the shelf products will be used for the GIS and DBMS tools, and once the database is developed, either the archival site or individual scientists will need to update the scripts as needed and share these among the community. Non-maintenance of those tools will not preclude the usefulness of the archival data. At the end of the granting period, the computational environment will consist of a prototype that will demonstrate the linking of models. The PI’s will seek support from NSF to maintain and extend the computational environment (just as individual researchers maintain the SIOSEIS, MB, and GMT software through proposals renewed every 3 to 5 years). Cuny, in particular, is committed to extending the software, and the prototype will be made available for any individual scientist to use. 4. How do we assure that we are not reinventing what someone else has already done? While the 9N RIDGE database involves developing some software that could be applied to other domains (the computational environment), much of what we propose is domain-specific to the ridge community, e.g., the data model, reformatting the data for viewing in the GIS, the metadata database. Presently, the ridge community does not have such a comprehensive database. While it is true that generic tools would make the development more efficient, there are (as yet) no commercial generic tools that work across scientific domains. The computational environment is a research prototype, and we know through the computer science community that there is no such software generically available. We will, of course, remain in contact with ridge researchers and the scientific computing community to capitalize on others’ work where appropriate. 5. How do we provide communication necessary to insure that what we do isn't being done over and over? We can insure that this is not duplicating work done within the RIDGE community by maintaining a web page, holding “Birds of Feather” sessions and workshops at meetings (e.g., AGU), and publishing notes in the RIDGE Events newsletter. We will also publish our work in the computer science journals and the Scientific Database Workshops. We cannot assure that this work is not “duplicated” in other disciplines; indeed if those scientists publish their findings or attend workshops, it is more likely that generic solutions to the problems we address can be solved. 6. How do we assure that scientists actually submit data to the VRV? Databases are useful only if people put their data into the database. While we cannot legislate that scientists archive their data, we can make the process easier and assure that any data in the archive adequately acknowledges the contributing scientist(s). NSF/OCE requires that PIs make core and rock samples available two years after the “observation” period, generally through established core labs, and some underway geophysical data must be submitted to the National Geophysical Data Center (NGDC). Similarly, OCE Data Policy and NSF General Grant Conditions, which are more encompassing, specify that derivative data must be shared. Unfortunately, NSF is not very strict about enforcing these guidelines, because much of the data remain in localized collections at various institutions. We believe the community will support our proposed efforts submission of data once the database is established and stable. Some Related URLs: Virtual Research Vessel (data sharing/web GIS portion) - http://dusk.geo.orst.edu/djl (temporary URL) http://davyjones-dell-rogue.geo.orst.edu (temporary URL) Virtual Research Vessel as an Educational Tool (VRV-ET) - http://www.cs.uoregon.edu/research/vrv-et/ AHA-NEMO 2 EPR Data - http://ahanemo2.whoi.edu/ Boomerang 8 Data Products (Tonga Trench & Forearc) - http://dusk.geo.orst.edu/tonga http://capnhook.geo.orst.edu (server being upgraded) CoAxial Segment - http://www.pmel.noaa.gov/vents/coax/coax.html Earth Ref Physical & Chemical Data and Models - http://www.earthref.org Endeavour Segment GIS - http://bromide.ocean.washington.edu/gis/ Hawaii Mapping Research Group's EPR archive http://www.soest.hawaii.edu/HMRG/Mesotech/EPR_Archive_Frame2.htm Lucky Strike - http://drifor.whoi.edu/LuckyStrike96/index.html NeMO Net Real-Time Monitoring & Data - http://newport.pmel.noaa.gov/nemo/realtime/ NOAA Vents Program Datasets/GIS - http://newport.pmel.noaa.gov/gis/data.html RIDGE Multibeam Synthesis Project- http://imager.ldeo.columbia.edu/ "Digital Backyard" - USGS & Microsoft TerraServer - http://terraserver.microsoft.com/ http://www-nmd.usgs.gov/esic/esic.html Federal Geographic Data Committee (metadata, clearinghouses) - http://www.fgdc.gov GLOBEC (Global Ecosystem Dynamics) - http://globec.gso.uri.edu/ (zooplankton data server) http://www.usglobec.org/ (main U.S. site) National Biological Information Infrastructure - http://www.nbii.gov National Spatial Data Infrastructure - http://www.fgdc.gov/nsdi/nsdi.html Oregon Coast Geospatial Clearinghouse - http://buccaneer.geo.orst.edu University Consortium for Geographic Information Science (UCGIS) - http://www.ucgis.org