archaeoinformatics.org DIGITAL ANTIQUITY: Planning a Digital Information Infrastructure for Archaeology Grant Period: 7/1/2007-6/30/2008 ARCHAEOINFORMATICS.ORG Steering Committee Keith Kintigh, Convener, Arizona State University Jeffrey Altschul, SRI Foundation Tim Kohler, Washington State University Fred Limp, University of Arkansas Julian Richards, University of York Dean Snow, The Pennsylvania State University also John Howard, Arizona State University C. Lee Giles, The Pennsylvania State University Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org DIGITAL ANTIQUITY: Planning a Digital Information Infrastructure for Archaeology • Planning Grant In Progress (7 months) – Goals: Preservation, Discovery & Access, & Data Integration – Scope: Newly Created an Legacy Data • CRM & Academic Archaeology • Documents, Databases, and Images, + Plan for Geospatial & Exotic – Initial Context: Americanist Archaeology • • • • • • Needs and Vision Organization Assessment Prototyping Platform & Tools Jump-starting Case Studies Current State of Planning – Technical, Social, Financial Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Disciplinary Needs for Broad-based Information Infrastructure • Explosion of digital information – $1 Billion/year in Archaeology (US) – 50,000-100,000 reports/year, 1000s of databases (US) • Discovery & Full Access – Lack of availability (on-line or otherwise) of information resources – Absence of intelligent discovery tools – Problem of data standardization – Lack of tools to enable semantic integration • Digital Preservation Problems – Absence of existing facilities for preservation – Media degradation & software obsolescence – Degradation & loss of data semantics (metadata) Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Professional Ethics • SAA Principle No. 5: Intellectual Property “Intellectual property, as contained in the knowledge and documents created through the study of archaeological resources, is part of the archaeological record. As such it should be treated in accord with the principles of stewardship rather than as a matter of personal possession.” • SAA Principle No. 7: Records and Preservation “Archaeologists should work actively for the preservation of, and long term access to, archaeological collections, records, and reports.” Society for American Archaeology - Principles of Archaeological Ethics 1996 Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Archaeoinformatics.org Vision • Sustainable, general-purpose infrastructure – New and legacy digital archaeological data • Goals for the infrastructure – – – – Preservation (with Versioning and Persistent Cite-ability), Discovery & Access, Data Integration Interoperability discovery and download with related Infrastructures • Registration of Extensive, Machine Processable Metadata – Integrated in workflows of those generating the data • Initial focus on delivering tools for: – Text (Gray Literature) – Databases, – Images • Work with ADS and others on metadata standards – Fostering interoperability – For Photogrammetry/Geospatial/CAD/Remote Sensing, LiDAR, Laser Scanning (HDS), Geophysical Data Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Outcomes Advance humanistic understandings of the past • Sustainable and reliable digital data & metadata preservation • Ability to re-evaluate hypotheses and arguments • Improved research through increased reuse of existing data • Enable large-scale & synthetic research • Time & cost savings • More effective use of research $ • Expanded availability to the broader public Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Organization Steering Committee Keith Kintigh, Past President SAA, Arizona State University Jeff Altschul, Past Treasurer SAA, SRI Foundation Tim Kohler, Past Editor, American Antiquity, Washington State University Fred Limp, Past Treasurer SAA, University of Arkansas Julian Richards, University of York (UK) Dean Snow, President SAA, The Pennsylvania State University Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Board of Directors Brian Crane, DOD/Versar, Inc. Katherine Emery, Florida Museum of Natural History, University of Florida Sebastian Heath, Archaeological Institute of America Eric Kansa, University of California at Berkeley, Alexandria Archive Institute Francis McManamon, Chief Archaeologist, National Park Service Worthy Martin, University of Virginia Fraser Neiman, Thomas Jefferson Foundation, University of Virginia Vincas Steponaitis, Past President SAA, Research Laboratories of Anthropology, University of North Carolina Herbert Van de Sompel, Los Alamos National Laboratory Phillip Walker, Past President AAPA, University of California at Santa Barbara Willeke Wendrich, University of California at Los Angeles Thomas Whitley, Brockington & Associates Mellon Funded Project Participants Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org NSF-Access Grid Virtual Lectures • Eric C. Kansa - Executive Director of the Alexandria Archive Institute – "Open Context:Community Tools for Publishing Research Data on the Web" • Chaitan Baru - Director of Science Research and Development at the San Diego Supercomputer Center – "GEON: Geosciences Network" • Michael J. Halm (& John Yoo) - Senior Strategist and Manager for the Special Project activities for the Teaching and Learning with Technology group, Penn State University, – "LionShare: Secure P2P File Sharing and Collaboration" • Mark Gahegan (& Chaitan Baru, Boyan Brodaric) - Professor of Geography and affiliate professor of Information Science and Technology at the Pennsylvania State University – "Sharing our resources, sharing our understanding: Cyberinfrastructure for Archaeology" • Fred Limp- Leica Chair and Director Center for Advanced Spatial Technologies, University of Arkansas – “Interoperability and net-centric architectures: lessons for archaeoinformatics from the Open Geospatial Consortium” • Mark Schildhauer - National Center for Ecological Analysis and Synthesis, Santa Barbara – "Ecological informatics: challenges and approaches, and potential relevance for archaeology ” • Julian D Richards - Professor of Archaeology, University of York and Director, Archaeology Data Service – “Current challenges for digital preservation and delivery” • Ian Johnson - Archaeological Computing Laboratory, University of Sydney – “ECAI: The snowball still survives“ • Katherine Skinner - Digital Projects Librarian at the Emory University Libraries – "Collaborative Adventures in Distributed Digital Preservation: The MetaArchive Cooperative and the Educopia Institute" Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Assessment: On-Line Survey User Needs and Attitudes • 270 responses primarily from members of the SAA’s Digital Data Interest Group • 94% responded that documentation of the archaeological record is being lost • 94% responded that they would use electronic data more if it were accessible • 90% responded that it is the responsibility of a project sponsor to fund and ensure curation of databases • More than 60% responded that users should not be charged access fees Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Phase I - Case Studies • Goal: Demonstrate research value to the community • Criteria for implementation case studies – – – – – driven by compelling research questions executed by multi-institutional cooperatives at least one international at least one with large component of legacy data; at least one with large component of recent CRM data • Southwest – Dolores Archaeological Project – Bandelier Archaeological Excavation Project • Central & West Mexico – Teotihuacan Mapping Project – La Quemada, Zacatecas • Fauna – Southwest & Midwest (NSF) Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Strategic Emphases • • • • • • • • Access & Discovery Preservation & Archiving Interoperability Leveraging existing open source initiatives Incorporating Web 2.0 characteristics Building on ADS and European experience Develop next generation shared infrastructure Prototypes allow us to assess potential designs – Platform – Tools – Institutional Structure Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Staged Implementation - Level 1 • Interoperable gateway – Development/adoption of publish and discovery specifications and tools for federated search – Package & search high-level metadata (not looking inside resources) • Creation of preservation archives • Development/adoption of best practices for workflows – Building collaboratively on ADS “Guides to good practice” • “Test bed” pilot projects – Focused on “high value” data sets & information – Investigate automated search and ontology development tools/strategies • e.g., Lagoze & Van de Sompel (interoperability) Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Staged Implementation Longer Term - Level 2 • Address complex semantic & ontology issues – Engage expert and institutional groups – Workshops, etc. • Develop/adopt semantic web tools to assist semantic mapping & ontology development – Integrate complex document search – Semantic database integration – Image and location search Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Publish and Discovery • • Interoperate using core metadata specs including ability to access actual data – e.g., what, where, when, permissions/control Adopt existing metadata discovery & aggregation toolkits – English Heritage Gateway – ARENA - Archaeological Record of Europe Networked Access – MIDAS/CIDOC (ISO 21127) Harmonization • • • • • Consistent w/ OAI-PMH Protocol for Metadata Harvesting Assign unique and persistent addresses for resources Institutional adoption of specifications & best practices – e.g., Fedora, DSpace,…? Community Building Workshops and evangelization Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Tentative Archive Architecture • Base Platform – – – – Central metadata catalog Search & discovery Trusted repository for information resources Expanded access to distributed resources if registered • Federated Repositories (Branding) – running same software stack • Discovery & Access through Other Repositories – Based on metadata sharing standard (e.g., OAI) Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Prototyping: Archive Platform • Prototype platform for an open source, Internet accessible archaeological information infrastructure. (NSF Funded) • tDAR already provides basic preservation, discovery and access functions • tDAR provides concept-oriented access and semantic integration across datasets • tDAR focuses on databases but also processes text and images. • One element of an international federated structure Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org tDAR’s Approach • Users: Anyone may register; approval for contributors • Register Project & Resources to tDAR Metadata Catalog – Goal: Preserve the original semantics of data – Project & Resource metadata (extended Dublin Core) – Extensive machine processable metadata at the level of data tables, columns, and values – Upload Files (or Point to Distributed Resources) • Text files in ASCII or PDF; Images in JPG and TIFF images; • Databases ingested as Access®, Excel®, or CSV files then converted to PostgreSQL for search integration & maintenance • Search: metadata or resource content (db or text) – Add ontology-driven concept-oriented search – Add search & download to other infrastructures, such as ADS or OpenContext • Download: Resources for further analysis – Add semantic integration across databases, output integrated databases – Complete citation information • Add Semantic Data Integration (output integrated databases) Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org tDAR: Semantic Integration • tDAR will reconcile the semantic demands of a query with the semantic content of the available datasets (rather than global reconciliation of data sources). • tDAR uses query-driven, ad-hoc data integration in which, given a query, – it will identify relevant data sources – reason with potentially incomplete or inconsistent information. – perform interactive, on-the-fly metadata matching to align key portions of the metadata – Interact, as necessary with the user • Expands on ADS Capabilities – open source code available for reuse Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org • Integrated search engine for archaeology • Searches text, citations, maps, tables, locations, time • Prototype data: 8,000 documents from JSTOR archaeology journals • Leverages other open source projects - Lucene indexer • JSTOR metadata used for metadata extraction and indexing • ChemXSeer, chemistry, table extraction and indexing (at Penn State) • Will use aspects of CiteSeerX ingestion, indexing and crawling • Table search and data extraction •Extract data from tables in an XML OAI format •For use in other experiments or data aggregation •Provide open source extraction tools for other systems. •Progress to date from a 6 month effort. • http://cxs02.ist.psu.edu:8080/archseer/ Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Implementation Case Studies • Expand Planning Grant Case Studies – Spread of agricultural societies in southwestern and southeastern US • New Case Studies – Arkansas Archaeological Survey • Exemplar of US comprehensive system – – – – – – North Carolina Gray Literature Open Context - Catahoulk, Petra, etc. UCLA Encyclopedia of Egyptology Global History of Health SRI human skeletal scan data Others? Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Social Demands on a Sustainable Digital Infrastructure • Credible Organizational Structure – Multi-institutional Board of Directors & Executive Committee • Buy-in from CRM and academic communities – Ease of use – Address confidentiality of archaeological site locations – Allow data in infrastructure to be private data for a time • Buy-in from funding, reviewing, or permitting bodies – – – – – Assist in meeting accountability and management needs Integrate registration in grant or compliance contract workflows Automated check consistency & completeness Project is not complete until Agency signs off on deposit Strengthen US regulations (36CFR79) and formal guidance Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Additional Social Demands on a Sustainable Digital Infrastructure • Work with professional societies – Establishment of “industry standards” will help federal agencies mandate use – Require publication of digital data with journal articles • Work with museums with responsibilities as digital data repositories • Ensure proper credit is given to contributors – Citations with downloads – Usage statistics – Optional peer review • Training: on-line and in person Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Professional Society Buy-In A related vision for databases developed by an NSF workshop and published in American Antiquity (Kintigh 2006) was endorsed by: – Society for American Archaeology – American Association of Physical Anthropologists – Society for Historical Archaeology Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org SAA Digital Data Interest Group Purpose: To promote the preservation and sharing of archaeological data maintained in digital form. – The long-term conservation and protection of the archaeological record demands that we preserve digital documents, images, and databases, and make them available to other scholars in order to advance archaeological understandings of the past. – The interest group will foster the development of shared digital archives of archaeological data. It will promote data sharing and preservation to the broader archaeological community and enhance communication and collaboration among data sharing initiatives. • Contact: Eric Kansa (UC Berkeley) • 796 Members to Date (>10% of SAA membership) Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Financial Dimensions of a Sustainable Digital Infrastructure • Infrastructure development and startup from grants • Revenues to maintain cyberinfrastructure from: – contracts with federal and state agencies to maintain and to provide access to publicly-funded archaeological data – disintermediation - capture savings from academic and CRM projects (e.g. .5% of $1Billion) • Fundraising to develop a long-term endowment to support the cyberinfrastructure • To the extent possible, user fees will not be employed • Time to operational solvency – 5-6 years? Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008, archaeoinformatics.org Acknowledgments Support from – The Andrew W. Mellon Foundation – NSF Grant IIS 0624341 • Steering Committee Institution Teams • Disciplinary & Technical Advisory Board Members Partners Arizona State University Statistical Research, Inc. The Pennsylvania State University University of Arkansas Washington State University Andrew W. Mellon Foundation Archaeology All-Projects Meeting New York - March 5-6, 2008,