March 30 Metadata WG Draft

advertisement
Working Group on Metadata for
Researcher-Created Primary Data
in History and Ethnography
Case Statement Proposal
CASE STATEMENT PROPOSAL FOR THE METADATA FOR RESEARCHERCREATED PRIMARY DATA IN HISTORY AND ETHNOGRAPHY WORKING
GROUP
Contents
1. WG
Charter………………………………………………………………………………
1
a. Short-term goals (M6)
b. Mid-term goals (M12)
c. Long-term goals (M18)
d. Timeframe
2. Value Proposition
.
Research case
a.
Business case
3. Engagement with Existing Work in the Area
4. Work Plan
.
Work plan components
a.
WG operation
5. Initial Membership
6. References
7. Appendix A: Leadership Biographical Notes
1. WG Charter
The Metadata for Researcher-Created Primary Data in History and Ethnography WG will
conduct research, develop a statement of best practices and release an adoptable product
centered on what needs to be in place (standards, protocols, policies, cultural
expectations) to make ethnographic and historical data archivable, discoverable and
shareable. In an initial phase we will identify and categorize a broad range of use cases
within history and ethnography, surveying the relevant literature and conducting
ethnographic interviews asking about the many kinds of data (such as recorded
interviews, field notes, and photographs, among others) that require metadata to be
archived and shared in a number of different user scenarios (born-digital interviews
produced by early career researchers, for example, or newly digitized data from more
established researcher). A second phase will document and analyze some of the diverse
metadata practices for a sampling of the uses cases and scenarios we identify, again
through a literature review, an environmental scan of projects and interviews with project
leads. Finally, we will propose best metadata practices for a variety of use cases/scenarios
and facilitate the uptake of these deliverables within a number of projects using the
Platform for Experimental and Collaborative Ethnography (PECE). Each phase is
expected to take six months. Early adopters of this WG’s deliverables will include
research groups working with PECE including The Asthma Files and The STS Disasters
Studies Network. Beyond the 18 month timeframe of this working group, lessons learned
from these early adopters will feed into the RDA Digital Practices in History and
Ethnography (DPHE) Interest Group’s continued dissemination of these best practices.
Our deliverables will make digital artifacts in ethnographic and historical research much
easier to share, find, use and cite effectively, perhaps even contributing to the
development of a credit/reward structure that would not only reduce barriers, but further
incentivize the sharing of data in the digital humanities.
1a. Short-term goals (M6)
In our first six months we will review a wide range of use cases, identifying scenarios
that historians and ethnographers (within and beyond RDA) encounter when working
with metadata for shared artifacts. Preliminary work conducted within the RDA IG for
Digital Practices in History and Ethnography (DPHE-IG) suggests that researchers often
struggle to develop appropriate metadata practices when digitizing and sharing the
following data types especially important to history and ethnography: field notes,
interviews (audio and video), grey matter, images, analytic structure, structured
annotations, surveys, maps, quantitative data, bibliographies, translations and work flows.
Of these, we have identified X data types for this working group to engage with. This
first phase of our project will provide greater details on this wide range of data use and
re-use scenarios. We will disseminate the findings from this first phase of our work in the
form of a table with information on different artifacts and a range of associated user
scenarios, source metadata, provenance metadata, technical metadata and intellectual
property metadata.
1b. Mid-term goals (M12)
Our second phase will document and analyze the diverse existing metadata practices of
researchers in history and ethnography. In order to scope this project appropriately for
our timeline, we will focus primarily on X. In our analysis, we will identify best practices
that could be codified and distributed as the second component of our deliverables.
1c. Long-term goals (M18)
In our final phase of work we will facilitate the uptake of our deliverables from the
second phase (best practices) in two projects using the PECE platform: The Asthma Files
and The STS Disaster Research Network. Users will be able to use a simple form-based
interface to input the relevant metadata for artifacts such as images, documents, audio
and video. In order to capture metadata for various analytic structures, PECE developers
will establish micro-attribution vocabularies that capture the complex provenance of a
particular analytic. We aim to learn lessons in this implementation phase that will help
facilitate the uptake in further platforms and projects beyond the 18 month timeframe,
with sustainable support provided by DPHE-IG.
1d. Timeframe
Phase one, documenting and analyzing use cases for metadata in history and
ethnography, will run from April 2015 through September 2015, with findings presented
at the RDA Sixth Plenary. From October 2015 through March 2016 we will collect data
on the many ways researchers in history and ethnography address metadata issues, with a
focus on interviews and field notes. Finally, we will codify best practices and work with
researchers using the PECE platform to adopt these deliverables from April 2016 through
September 2016.
2. Value Proposition
Research case: Given the cultural and social complexity (as well as technical, ecological
and economic complexity) of many global problems today, collaborative empirical
humanities research has renewed urgency. The empirical humanities include history,
folklore, cultural anthropology and other fields in which researchers collect primary data
of different types that can be used for cultural analysis. Today, these researchers often
need to collaborate to understand phenomena that operate across geographic regions,
scale, and communities of people. But established research practices and infrastructures
in the empirical humanities do not support this. For decades, research in these fields has
been an almost entirely individual-centric enterprise. Field notes, found documents,
found or researcher-created photographs or recordings, and other data used in cultural
analysis are very rarely shared, except when reduced or rendered into some form of
publication or museum display.
One of the primary barriers to sharing data within the empirical humanities is a lack of
agreed-upon protocols for metadata standards for user-created primary research data.
While there has been a great deal of work in the cultural heritage arena, especially within
museums and libraries, the proliferation __________. In the cultural heritage sector, for
example, Jenn Riley has identified 105 standards and notes that “the sheer number of
metadata standards in the cultural heritage sector is overwhelming, and their interrelationships further complicate the situation.” In contrast, the RDA Metadata Standards
Directory WG lists only one standard for heritage studies and one for anthropology
(Open Archives Initiative - see below, the Engagement with Existing Work in the Field
section, for critical commentary). Many researchers find themselves caught in the
confusing space between the dizzying proliferation of standards and a one-size-fits-all
approach that can miss out on the diversity of data practices within disciplines. We will
produce a simple list of recommended metadata fields for a delimited set of artifact types,
analytics and use cases. Once endorsed by the RDA, and taken up by early adopters,
these best practices will be a go-to resource for researchers that may then choose to
modify (add or subtract) the fields we suggest. Development and uptake of shared
metadata practices and tools will make user-created research data more findable and
usable within these research traditions. The work of this WG could also contribute to the
development of mechanisms providing greater credit for sharing data.
Business Case: The standards we develop are likely to be taken up by individual
researchers, people working on collaborative projects and institutions. Researchers in the
empirical humanities are especially likely to benefit from the deliverables of this group
due to limits of existing work in the field. Building digital infrastructure to support more
data sharing and collaboration in the empirical humanities is far from straightforward.
Analytic techniques in the empirical humanities differ from those in social science fields
that may collect similar data, and are more akin to those used in literary and philosophical
research, relying primarily on hermeneutics (interpretation for explanation and evocation
rather than representative or statistical sampling for identification and validation). The
goal is not to develop a concise and consistent view of an object, but to produce and
explore multiple views of an object, leveraging “epistemological pluralism” (Keller 2002;
Turkle and Papert 1990). Indeed, providing multiple, different interpretations and ways
of representing particular phenomena (the sociocultural causes and impacts of the
Fukushima nuclear disaster, or the impact of genetics research on understandings of
environmental health, for example) is the key task for humanities researchers.
Computational advances that support open-ended, underdetermined engagement with
digital content that enables (even encourages) drift and transmutation in the way content
is identified and taken up in analysis, are thus required.
Metadata is particularly complex in the empirical humanities, even more so when
research is collaborative. Empirical material often has limited or contested provenance
information; the “empirical” itself can shift, in relevance or prevalence, as analytic
structures evolve and multiply. Qualitative interviews are not just collected, for
example, they are produced, through questions and other elicitation techniques
developed by the interviewer (often drawing on complex traditions of thought about
language, culture, and society). Interviews are then analyzed, again using analytic
structures developed within complex traditions of thought. If interviews are analyzed
collaboratively, different analytic structures may be used by different researchers, or
different researchers may deploy “the same” analytic structure in different ways, and
come to different interpretations of what an interview, image, or document “says.” It is
thus critical to recognize – and make accessible and discoverable (if researchers deem
this appropriate) – the analytic structures through which data in the empirical
humanities is both produced and interpreted. Metadata functionality thus needs to be in
place at many stages in the ethnographic research process, addressing diverse types of
“data”—including analytic structures used to produce and interpret empirical data.
Specific groups committed to taking up the deliverables of this WG include
collaborative research projects on the PECE platform. Two instances of PECE – The
Asthma Files (TAF-PECE) and The Disaster-STS Research Network (DSTS-PECE) –
will provide venues for the implementation of the deliverables proposed here. Both
have small but active, cooperative and growing user communities. TAF-PECE is a
collaborative research project that currently has approximately ten users in
geographically distributed locations, all likely to be working on the platform on a daily
basis. DSTS-PECE is an international research network that will be actively enrolling
new members over the next twelve months -- in groups of five to ten researchers; this
incremental enrollment of new members will provide excellent opportunities to test and
improve new, embedded metadata management policies; students at Rensselaer will
also be DSTS-PECE users. Technical implementations of new metadata policies in
PECE will first run on a PECE test site, then be moved to the TAF and DSTS PECE
instances. Embedded metadata policies will be part of the PECE Github release in
September 2015.
We are in communication with early-career scholars that are interested in how smart
metadata practices might affect their collection of born-digital data, developing
informed consent forms, for example, that allow their interlocutors to make a variety of
choices about how interviews will be shared. We have also been in close
communication with researchers, such as Sharon Traweek and Michael M.J. Fischer,
that have considerable repositories of research material, that are awaiting our WG
deliverables in order to digitize and make shareable their research collections. By
connecting to researchers both within and outside of the RDA, a tangential benefit of
this WG will be to broaden RDA engagement, especially within the digital
humanities.
Individuals, communities and initiatives that will benefit from the proposed WG:




Researchers: by reducing psychological, institutional, political, cultural and
technological barriers to digitizing and sharing data, making shared data easier
to find and cite and improving mechanisms for credit
Collaborative research platform developers: by providing a guide to various
metadata options and recommendations on the important fields to include in
form-based metadata entry systems.
Interlocutors: by providing informed consent forms with a wide range of
options for sharing and dissemination of interviews.
Collections: better metadata will improve accessibility, raising demand for
archived material, helping collections better meet their mission.
3. Engagement with Existing Work in the Area
Our preliminary research suggests that researchers in history and ethnography can
quickly become overwhelmed by the diverse and somewhat scattered metadata
standards and that many researchers have their own ideas about the limits of existing
metadata practices and standards. Historian of cartography Pat Seed, for example, is
involved in efforts to define best metadata practices for maps and has noted that Dublin
Core is far from sufficient. Many advanced digital projects supporting historical and
ethnographic research comply with the metadata standards recommended by the Open
Archives Initiative for web content interoperability. One researcher we interviewed
suggested that the OAI standards are "out-of-date," and followed up with, “on my
comment about OSA being out-of-date, I was talking about how the standard uses
older web technology and has not been updated or changed in quite some time. If I
remember correctly it's based a XML encoded format, using some properties and ideas
that are a little out of date. If I were to do it today, first we would want a separation the
data model and the data encoding. For example, many APIs allow you to get results
back in JSON, XML, RSS, etc. This separation of data model and encoding allows you
to support many different encoding standards, even ones that don't exist yet. I would
use RDF to model the information (a language for modeling data, not just encoding it),
giving the terms and ideas we care about URIs (just like URLs you find on the web)
that can be looked up and explained to any human or machine.” This WG will examine
the value (and possible limits) of encouraging community-wide compliance with
Dublin Core, OAI and many other standards.
We plan to partner with existing RDA Groups, such as the Metadata IG, the Metadata
Standards Directory WG, the Research Data Provenance IG and the Engagement IG.
Individual researchers and groups within the RDA working on linked data,
preservation, presistent identifies, dynamic data citation and the long tail of research
data will also be key partners. These connections will be strengthened at and beyond
the RDA Fifth Plenary.
Beyond the RDA, we will engage with institutions with widely respected standards (i.e.
the Smithsonian), initiatives (i.e. Open Folklore) and publishing bodies with a digital
presence (i.e. the Journal of Cultural Anthropology).
[[need more on engagement with Metadata IG, WG, etc.]]
4. Work Plan
Work Plan Components
1. Survey of relevant literature and projects in order to develop a list of
interviewees and build initial use-case scenarios.
2. Ethnographic interviews with researchers in history and ethnography on the
types of data for which they need metadata practices, the scenarios in which they
encounter metadata decisions and (with a focus on interviews and field notes)
their practices.
3. Drafting deliverables in order to codify metadata practices deemed “best” in the
context of different scenarios.
4. Facilitating uptake of deliverables, initially with researchers using PECE.
5. Reporting on lessons learned in initial uptake, and working with the DPHE-IG
to ensure sustainability and evolution of the deliverables and their uptake.
6. Promoting the deliverables from this WG within the RDA and beyond.
WG Operation
The initial core members of this WG will meet weekly to ensure continual development
towards the proposed deliverables. The initial WG members have a well-established
working relationship, with a record of collaborative peer-reviewed publications and
presentations (at the American Anthropological Association, the Society for the Social
Studies of Science, and other conferences) disseminating the results of their work.
Differences of opinion and experience will be viewed as an asset within this WG, and
will resolve through good communication and collaboration practice.
In the spirit of “user-centered design,” this WG will also partner with developers of the
PECE platform from the early stages to increase the likelihood of the deliverables
meeting user needs. An ongoing series of “project shares” and “issues shares” hosted by
the DPHE-IG will also provide frequent opportunities for members of this WG to
envision how the deliverables could feed into a wide variety of digital humanities
projects. This WG will be a vehicle for the broadly understood need for RDA to continue
developing engagement with social science and humanities research communities.
Updates to (and input from) the broader community of RDA will be provided at plenaries
every six months in the form of poster sessions, breakout groups and birds of a feather
sessions.
5. Initial Membership
Leadership (brief biographic notes in Appendix A)


Co-chair: Kim Fortun, Rensselaer Polytechnic Institute
Co-chair: Mike Fortun, Rensselaer Polytechnic Institute
Initial Members/Interested (based on prior discussions and involvement with the
DPHE-IG)








Alison Kenner
Brandon Costelloe-Kuehn
Dan Price
Dominic Difranzo
Jason Baird Jackson
Lindsay Poirier
Luis Felipe Rosado Murillo
Sharon Traweek
6. References
[[many relevant annotations here and in the PECE Zotero, but the WG proposal I looked
at tended to have very few references]]
Keller, Evelyn Fox. 1995. Reflections on Gender and Science. Yale University Press.
Turkle, Sherry, and Seymour Papert. 1990. “Epistemological Pluralism: Styles and Voices
within the Computer Culture.” Signs 16 (1): 128–57.
Jenn Riley’s “Visualizing the Metadata Universe.”
7. Appendix A: Leadership Biographical Notes
Kim Fortun is a cultural anthropologist and Professor of Science & Technology Studies
at Rensselaer Polytechnic Institute. Her research and teaching focus on environmental
risk and disaster, and on experimental ethnographic methods and research design. Fortun
is a co-chair of the DPHE and is playing a lead role in the development of PECE, an open
source/access digital platform for anthropological and historical research. [[[Kim Fortun
will continue dialogue with the group convened by NSF to identify best practices in data
management for the history and social studies of science.]]]
Mike Fortun is a historian and anthropologist of the life sciences whose research has
focused on the contemporary science, culture, and political economy of genomics. His
work has covered the policy, scientific, and social history of the Human Genome Project
in the U.S.; the growth of commercial genomics and bioinformatics in the speculative
economies of the 1990s; and the emergence of transdisciplinary research programs in
toxicogenomics, addiction, and environmental health. Mike Fortun is a co-chair of the
DPHE-IG and is a lead developer of PECE.
Additional material, and notes from P5 on the process of getting approved:
[[Use case doc with table here]]
[[some relevant annotations here and (soon to be) in the PECE Zotero]]
DARIAH (european) building crosswalks, and "archive in a box." easy to install packages...
expected to be completed in 2016...
Basic deliverable: For X data type you need at least these metadata fields, you can
do that however you want, we can tell you how to do it in a Drupal instance…
Credit the PECE project for the empirical humanities definition.
Mention NSF workshop Kim was just at, described in the NEH grant… Kim knows
what’s going on in that field. the equivalent of an environmental scan.
update description of the various instances of the PECE platform.
The timing of the proposed effort is excellent. The PECE design group recently
received a seed grant from Rensselaer’s Office of Research that will support a PhD
student dedicated to PECE platform development for a full year. This will
dramatically speed platform development and growth of our user community,
allowing for extensive testing of the data management policies proposed for
development here. Rensselaer seed funds will also partially support the April 2015
workshop to test PECE; a portion of this workshop can be used to vet new data
management policies. Additionally, in January 2015, Kim Fortun will participate in
a 3-day NSF workshop focused on data management for the historical and social
studies of science (her focus area as an anthropologist). This meeting and follow up
work will allow Fortun to build on and contribute to up-to-date modeling of best
practices for data management in the empirical humanities.
3rd phase; This can serve as a model for broader adoption of these RDA outcomes
in the empirical humanities.
What do the practical policies group already offer that this will build on? And the
metadata IG? Differences?
1. Contextual metadata extraction: PECE, like many EDH projects, needs to be
interoperable with data sources employing diverse metadata formats; we will be
using RDF to map these into platforms metadata model.
If we keep the credit language:
To deal with attribution when PECE contributions (artifacts, analytic structures,
etc) are used by other researchers. The author of original contributions is now
recognized, and attribution guidelines are provided. We don’t yet have the
technical means to recognized extensions of analytic structures (which is key to the
collaborative process for ethnographers). Further refinement of how contributions
to these structures (and to the PECE platform generally) are tracked and credited
will be important going forward as it will shape researcher’s interest in
collaboration. It must be noted that collaboration among ethnographers will have to
be actively encouraged given the extreme individualist tradition of work in the field
in recent decades.
recommended:
1. contact enquiriest@rd-alliance.org, a Secretariat Liaison will be assigned to
our group. we really really really want you to contact us before you start.
2. put together charter
review criteria:
1. are there measurable outcomes? - how to measure? not just a report/doc, but
something that can be used and adopted… but can’t report/docs be adopted?
more tangible than measurable? WG on WDF certification is example of
non-technical group.
2. will the outcomes be taken up by the intended community
3. will the outcomes foster data sharing and/or exchange?
4. can the proposed work, outcomes/deliverables, and action plan described be
accomplished in 12-18 months?
5. is the scope too large for effective progress, to small for an RDA effort, or not
appropriate for the RDA?
6. overall is this worthwhile for the RDA? does it add value above what is
currently being done within the community?
community review 4 wks
TAB review (2 weeks)
Council Review (2-4 weeks)
after review: secretariat liaison will facilitate building communication/recording/etc.
joint activities with RDA affiliates encouraged.
See Margaret’s comments on metadata (data on the conditions of the
gathering/producing of the data) vs. annotations.
http://www.mndigital.org/digitizing/standards/metadata.pdf
Download