Working Group on Metadata for Researcher-Created Primary Data in History and Ethnography Case Statement Proposal CASE STATEMENT PROPOSAL FOR THE METADATA FOR RESEARCHERCREATED PRIMARY DATA IN HISTORY AND ETHNOGRAPHY WORKING GROUP Contents 1. WG Charter……………………………………………………………………………… 1 a. Short-term goals (M6) b. Mid-term goals (M12) c. Long-term goals (M18) d. Timeframe 2. Value Proposition . Research case a. Business case 3. Engagement with Existing Work in the Area 4. Work Plan . Work plan components a. WG operation 5. Initial Membership 6. References 7. Appendix A: Leadership Biographical Notes 1. WG Charter The Metadata for Researcher-Created Primary Data in History and Ethnography WG will conduct research, develop a statement of best practices and release an adoptable product centered on what needs to be in place (standards, protocols, policies, cultural expectations) to make ethnographic and historical data archivable, discoverable and shareable. In an initial phase we will identify and categorize a broad range of use cases within history and ethnography, surveying the relevant literature and conducting ethnographic interviews asking about the many kinds of data (such as recorded interviews, field notes, and photographs, among others) that require metadata to be archived and shared in a number of different user scenarios (born-digital interviews produced by early career researchers, for example, or newly digitized data from more established researcher). A second phase will document and analyze some of the diverse metadata practices for a sampling of the uses cases and scenarios we identify, again through a literature review, an environmental scan of projects and interviews with project leads. Finally, we will propose best metadata practices for a variety of use cases/scenarios and facilitate the uptake of these deliverables within a number of projects using the Platform for Experimental and Collaborative Ethnography (PECE). Each phase is expected to take six months. Early adopters of this WG’s deliverables will include research groups working with PECE including The Asthma Files and The STS Disasters Studies Network. Beyond the 18 month timeframe of this working group, lessons learned from these early adopters will feed into the RDA Digital Practices in History and Ethnography (DPHE) Interest Group’s continued dissemination of these best practices. Our deliverables will make digital artifacts in ethnographic and historical research much easier to share, find, use and cite effectively, perhaps even contributing to the development of a credit/reward structure that would not only reduce barriers, but further incentivize the sharing of data in the digital humanities. 1a. Short-term goals (M6) In our first six months we will review a wide range of use cases, identifying scenarios that historians and ethnographers (within and beyond RDA) encounter when working with metadata for shared artifacts. Preliminary work conducted within the RDA IG for Digital Practices in History and Ethnography (DPHE-IG) suggests that researchers often struggle to develop appropriate metadata practices when digitizing and sharing the following data types especially important to history and ethnography: field notes, interviews (audio and video), grey matter, images, analytic structure, structured annotations, surveys, maps, quantitative data, bibliographies, translations and work flows. Of these, we have identified X data types for this working group to engage with. This first phase of our project will provide greater details on this wide range of data use and re-use scenarios. We will disseminate the findings from this first phase of our work in the form of a table with information on different artifacts and a range of associated user scenarios, source metadata, provenance metadata, technical metadata and intellectual property metadata. 1b. Mid-term goals (M12) Our second phase will document and analyze the diverse existing metadata practices of researchers in history and ethnography. In order to scope this project appropriately for our timeline, we will focus primarily on X. In our analysis, we will identify best practices that could be codified and distributed as the second component of our deliverables. 1c. Long-term goals (M18) In our final phase of work we will facilitate the uptake of our deliverables from the second phase (best practices) in two projects using the PECE platform: The Asthma Files and The STS Disaster Research Network. Users will be able to use a simple form-based interface to input the relevant metadata for artifacts such as images, documents, audio and video. In order to capture metadata for various analytic structures, PECE developers will establish micro-attribution vocabularies that capture the complex provenance of a particular analytic. We aim to learn lessons in this implementation phase that will help facilitate the uptake in further platforms and projects beyond the 18 month timeframe, with sustainable support provided by DPHE-IG. 1d. Timeframe Phase one, documenting and analyzing use cases for metadata in history and ethnography, will run from April 2015 through September 2015, with findings presented at the RDA Sixth Plenary. From October 2015 through March 2016 we will collect data on the many ways researchers in history and ethnography address metadata issues, with a focus on interviews and field notes. Finally, we will codify best practices and work with researchers using the PECE platform to adopt these deliverables from April 2016 through September 2016. 2. Value Proposition Research case: Given the cultural and social complexity (as well as technical, ecological and economic complexity) of many global problems today, collaborative empirical humanities research has renewed urgency. The empirical humanities include history, folklore, cultural anthropology and other fields in which researchers collect primary data of different types that can be used for cultural analysis. Today, these researchers often need to collaborate to understand phenomena that operate across geographic regions, scale, and communities of people. But established research practices and infrastructures in the empirical humanities do not support this. For decades, research in these fields has been an almost entirely individual-centric enterprise. Field notes, found documents, found or researcher-created photographs or recordings, and other data used in cultural analysis are very rarely shared, except when reduced or rendered into some form of publication or museum display. One of the primary barriers to sharing data within the empirical humanities is a lack of agreed-upon protocols for metadata standards for user-created primary research data. While there has been a great deal of work in the cultural heritage arena, especially within museums and libraries, the proliferation __________. In the cultural heritage sector, for example, Jenn Riley has identified 105 standards and notes that “the sheer number of metadata standards in the cultural heritage sector is overwhelming, and their interrelationships further complicate the situation.” In contrast, the RDA Metadata Standards Directory WG lists only one standard for heritage studies and one for anthropology (Open Archives Initiative - see below, the Engagement with Existing Work in the Field section, for critical commentary). Many researchers find themselves caught in the confusing space between the dizzying proliferation of standards and a one-size-fits-all approach that can miss out on the diversity of data practices within disciplines. We will produce a simple list of recommended metadata fields for a delimited set of artifact types, analytics and use cases. Once endorsed by the RDA, and taken up by early adopters, these best practices will be a go-to resource for researchers that may then choose to modify (add or subtract) the fields we suggest. Development and uptake of shared metadata practices and tools will make user-created research data more findable and usable within these research traditions. The work of this WG could also contribute to the development of mechanisms providing greater credit for sharing data. Business Case: The standards we develop are likely to be taken up by individual researchers, people working on collaborative projects and institutions. Researchers in the empirical humanities are especially likely to benefit from the deliverables of this group due to limits of existing work in the field. Building digital infrastructure to support more data sharing and collaboration in the empirical humanities is far from straightforward. Analytic techniques in the empirical humanities differ from those in social science fields that may collect similar data, and are more akin to those used in literary and philosophical research, relying primarily on hermeneutics (interpretation for explanation and evocation rather than representative or statistical sampling for identification and validation). The goal is not to develop a concise and consistent view of an object, but to produce and explore multiple views of an object, leveraging “epistemological pluralism” (Keller 2002; Turkle and Papert 1990). Indeed, providing multiple, different interpretations and ways of representing particular phenomena (the sociocultural causes and impacts of the Fukushima nuclear disaster, or the impact of genetics research on understandings of environmental health, for example) is the key task for humanities researchers. Computational advances that support open-ended, underdetermined engagement with digital content that enables (even encourages) drift and transmutation in the way content is identified and taken up in analysis, are thus required. Metadata is particularly complex in the empirical humanities, even more so when research is collaborative. Empirical material often has limited or contested provenance information; the “empirical” itself can shift, in relevance or prevalence, as analytic structures evolve and multiply. Qualitative interviews are not just collected, for example, they are produced, through questions and other elicitation techniques developed by the interviewer (often drawing on complex traditions of thought about language, culture, and society). Interviews are then analyzed, again using analytic structures developed within complex traditions of thought. If interviews are analyzed collaboratively, different analytic structures may be used by different researchers, or different researchers may deploy “the same” analytic structure in different ways, and come to different interpretations of what an interview, image, or document “says.” It is thus critical to recognize – and make accessible and discoverable (if researchers deem this appropriate) – the analytic structures through which data in the empirical humanities is both produced and interpreted. Metadata functionality thus needs to be in place at many stages in the ethnographic research process, addressing diverse types of “data”—including analytic structures used to produce and interpret empirical data. Specific groups committed to taking up the deliverables of this WG include collaborative research projects on the PECE platform. Two instances of PECE – The Asthma Files (TAF-PECE) and The Disaster-STS Research Network (DSTS-PECE) – will provide venues for the implementation of the deliverables proposed here. Both have small but active, cooperative and growing user communities. TAF-PECE is a collaborative research project that currently has approximately ten users in geographically distributed locations, all likely to be working on the platform on a daily basis. DSTS-PECE is an international research network that will be actively enrolling new members over the next twelve months -- in groups of five to ten researchers; this incremental enrollment of new members will provide excellent opportunities to test and improve new, embedded metadata management policies; students at Rensselaer will also be DSTS-PECE users. Technical implementations of new metadata policies in PECE will first run on a PECE test site, then be moved to the TAF and DSTS PECE instances. Embedded metadata policies will be part of the PECE Github release in September 2015. We are in communication with early-career scholars that are interested in how smart metadata practices might affect their collection of born-digital data, developing informed consent forms, for example, that allow their interlocutors to make a variety of choices about how interviews will be shared. We have also been in close communication with researchers, such as Sharon Traweek and Michael M.J. Fischer, that have considerable repositories of research material, that are awaiting our WG deliverables in order to digitize and make shareable their research collections. By connecting to researchers both within and outside of the RDA, a tangential benefit of this WG will be to broaden RDA engagement, especially within the digital humanities. Individuals, communities and initiatives that will benefit from the proposed WG: Researchers: by reducing psychological, institutional, political, cultural and technological barriers to digitizing and sharing data, making shared data easier to find and cite and improving mechanisms for credit Collaborative research platform developers: by providing a guide to various metadata options and recommendations on the important fields to include in form-based metadata entry systems. Interlocutors: by providing informed consent forms with a wide range of options for sharing and dissemination of interviews. Collections: better metadata will improve accessibility, raising demand for archived material, helping collections better meet their mission. 3. Engagement with Existing Work in the Area Our preliminary research suggests that researchers in history and ethnography can quickly become overwhelmed by the diverse and somewhat scattered metadata standards and that many researchers have their own ideas about the limits of existing metadata practices and standards. Historian of cartography Pat Seed, for example, is involved in efforts to define best metadata practices for maps and has noted that Dublin Core is far from sufficient. Many advanced digital projects supporting historical and ethnographic research comply with the metadata standards recommended by the Open Archives Initiative for web content interoperability. One researcher we interviewed suggested that the OAI standards are "out-of-date," and followed up with, “on my comment about OSA being out-of-date, I was talking about how the standard uses older web technology and has not been updated or changed in quite some time. If I remember correctly it's based a XML encoded format, using some properties and ideas that are a little out of date. If I were to do it today, first we would want a separation the data model and the data encoding. For example, many APIs allow you to get results back in JSON, XML, RSS, etc. This separation of data model and encoding allows you to support many different encoding standards, even ones that don't exist yet. I would use RDF to model the information (a language for modeling data, not just encoding it), giving the terms and ideas we care about URIs (just like URLs you find on the web) that can be looked up and explained to any human or machine.” This WG will examine the value (and possible limits) of encouraging community-wide compliance with Dublin Core, OAI and many other standards. We plan to partner with existing RDA Groups, such as the Metadata IG, the Metadata Standards Directory WG, the Research Data Provenance IG and the Engagement IG. Individual researchers and groups within the RDA working on linked data, preservation, presistent identifies, dynamic data citation and the long tail of research data will also be key partners. These connections will be strengthened at and beyond the RDA Fifth Plenary. Beyond the RDA, we will engage with institutions with widely respected standards (i.e. the Smithsonian), initiatives (i.e. Open Folklore) and publishing bodies with a digital presence (i.e. the Journal of Cultural Anthropology). [[need more on engagement with Metadata IG, WG, etc.]] 4. Work Plan Work Plan Components 1. Survey of relevant literature and projects in order to develop a list of interviewees and build initial use-case scenarios. 2. Ethnographic interviews with researchers in history and ethnography on the types of data for which they need metadata practices, the scenarios in which they encounter metadata decisions and (with a focus on interviews and field notes) their practices. 3. Drafting deliverables in order to codify metadata practices deemed “best” in the context of different scenarios. 4. Facilitating uptake of deliverables, initially with researchers using PECE. 5. Reporting on lessons learned in initial uptake, and working with the DPHE-IG to ensure sustainability and evolution of the deliverables and their uptake. 6. Promoting the deliverables from this WG within the RDA and beyond. WG Operation The initial core members of this WG will meet weekly to ensure continual development towards the proposed deliverables. The initial WG members have a well-established working relationship, with a record of collaborative peer-reviewed publications and presentations (at the American Anthropological Association, the Society for the Social Studies of Science, and other conferences) disseminating the results of their work. Differences of opinion and experience will be viewed as an asset within this WG, and will resolve through good communication and collaboration practice. In the spirit of “user-centered design,” this WG will also partner with developers of the PECE platform from the early stages to increase the likelihood of the deliverables meeting user needs. An ongoing series of “project shares” and “issues shares” hosted by the DPHE-IG will also provide frequent opportunities for members of this WG to envision how the deliverables could feed into a wide variety of digital humanities projects. This WG will be a vehicle for the broadly understood need for RDA to continue developing engagement with social science and humanities research communities. Updates to (and input from) the broader community of RDA will be provided at plenaries every six months in the form of poster sessions, breakout groups and birds of a feather sessions. 5. Initial Membership Leadership (brief biographic notes in Appendix A) Co-chair: Kim Fortun, Rensselaer Polytechnic Institute Co-chair: Mike Fortun, Rensselaer Polytechnic Institute Initial Members/Interested (based on prior discussions and involvement with the DPHE-IG) Alison Kenner Brandon Costelloe-Kuehn Dan Price Dominic Difranzo Jason Baird Jackson Lindsay Poirier Luis Felipe Rosado Murillo Sharon Traweek 6. References [[many relevant annotations here and in the PECE Zotero, but the WG proposal I looked at tended to have very few references]] Keller, Evelyn Fox. 1995. Reflections on Gender and Science. Yale University Press. Turkle, Sherry, and Seymour Papert. 1990. “Epistemological Pluralism: Styles and Voices within the Computer Culture.” Signs 16 (1): 128–57. Jenn Riley’s “Visualizing the Metadata Universe.” 7. Appendix A: Leadership Biographical Notes Kim Fortun is a cultural anthropologist and Professor of Science & Technology Studies at Rensselaer Polytechnic Institute. Her research and teaching focus on environmental risk and disaster, and on experimental ethnographic methods and research design. Fortun is a co-chair of the DPHE and is playing a lead role in the development of PECE, an open source/access digital platform for anthropological and historical research. [[[Kim Fortun will continue dialogue with the group convened by NSF to identify best practices in data management for the history and social studies of science.]]] Mike Fortun is a historian and anthropologist of the life sciences whose research has focused on the contemporary science, culture, and political economy of genomics. His work has covered the policy, scientific, and social history of the Human Genome Project in the U.S.; the growth of commercial genomics and bioinformatics in the speculative economies of the 1990s; and the emergence of transdisciplinary research programs in toxicogenomics, addiction, and environmental health. Mike Fortun is a co-chair of the DPHE-IG and is a lead developer of PECE. Additional material, and notes from P5 on the process of getting approved: [[Use case doc with table here]] [[some relevant annotations here and (soon to be) in the PECE Zotero]] DARIAH (european) building crosswalks, and "archive in a box." easy to install packages... expected to be completed in 2016... Basic deliverable: For X data type you need at least these metadata fields, you can do that however you want, we can tell you how to do it in a Drupal instance… Credit the PECE project for the empirical humanities definition. Mention NSF workshop Kim was just at, described in the NEH grant… Kim knows what’s going on in that field. the equivalent of an environmental scan. update description of the various instances of the PECE platform. The timing of the proposed effort is excellent. The PECE design group recently received a seed grant from Rensselaer’s Office of Research that will support a PhD student dedicated to PECE platform development for a full year. This will dramatically speed platform development and growth of our user community, allowing for extensive testing of the data management policies proposed for development here. Rensselaer seed funds will also partially support the April 2015 workshop to test PECE; a portion of this workshop can be used to vet new data management policies. Additionally, in January 2015, Kim Fortun will participate in a 3-day NSF workshop focused on data management for the historical and social studies of science (her focus area as an anthropologist). This meeting and follow up work will allow Fortun to build on and contribute to up-to-date modeling of best practices for data management in the empirical humanities. 3rd phase; This can serve as a model for broader adoption of these RDA outcomes in the empirical humanities. What do the practical policies group already offer that this will build on? And the metadata IG? Differences? 1. Contextual metadata extraction: PECE, like many EDH projects, needs to be interoperable with data sources employing diverse metadata formats; we will be using RDF to map these into platforms metadata model. If we keep the credit language: To deal with attribution when PECE contributions (artifacts, analytic structures, etc) are used by other researchers. The author of original contributions is now recognized, and attribution guidelines are provided. We don’t yet have the technical means to recognized extensions of analytic structures (which is key to the collaborative process for ethnographers). Further refinement of how contributions to these structures (and to the PECE platform generally) are tracked and credited will be important going forward as it will shape researcher’s interest in collaboration. It must be noted that collaboration among ethnographers will have to be actively encouraged given the extreme individualist tradition of work in the field in recent decades. recommended: 1. contact enquiriest@rd-alliance.org, a Secretariat Liaison will be assigned to our group. we really really really want you to contact us before you start. 2. put together charter review criteria: 1. are there measurable outcomes? - how to measure? not just a report/doc, but something that can be used and adopted… but can’t report/docs be adopted? more tangible than measurable? WG on WDF certification is example of non-technical group. 2. will the outcomes be taken up by the intended community 3. will the outcomes foster data sharing and/or exchange? 4. can the proposed work, outcomes/deliverables, and action plan described be accomplished in 12-18 months? 5. is the scope too large for effective progress, to small for an RDA effort, or not appropriate for the RDA? 6. overall is this worthwhile for the RDA? does it add value above what is currently being done within the community? community review 4 wks TAB review (2 weeks) Council Review (2-4 weeks) after review: secretariat liaison will facilitate building communication/recording/etc. joint activities with RDA affiliates encouraged. See Margaret’s comments on metadata (data on the conditions of the gathering/producing of the data) vs. annotations. http://www.mndigital.org/digitizing/standards/metadata.pdf