iSchool-greenberg

advertisement
Theories of Evolution and Cultural Diffusion: The Dryad
Repository Case Study for Understanding Changes in
Organizing Information Practices
XXX
1st author's affiliation
1st line of address
2nd line of address
Telephone number, incl. country code
1st author's email address
ABSTRACT
Digital networked and graphical technology is having a significant
impact not only on the processes for creating, disseminating, and
sharing information, but the practices for organizing information.
These changes need to be studied and understood in order to keep
apace of new developments and to consider how they may
improve information organization. This paper considers theories
of evolution and cultural diffusion for explaining and
contextualizing the change in organizing information practices.
The paper discusses these theories and their value for studying
change. A case study focuses on the Dryad repository for data
underlying published research in the field of evolutionary biology.
This inquiry indicates that the theories of evolution and cultural
diffusion are applicable to Dryad and reflective of change.
Categories and Subject Descriptors
H.3 INFORMATION STORAGE AND RETRIEVAL
General Terms
Design, Standardization, Theory.
Keywords
Organizing information, Evolution, Cultural diffusion, Digital
repository.
1. INTRODUCTION
Traditional physical barriers for creating, storing, and
disseminating intellectual content are being eliminated by
pervasive networked information technologies supporting
instantaneous production and sharing of digital output in nearly
any format. These developments have motivated the following
seemingly significant changes in organizing information practices:
1. an increase in access to a wide-variety of systems for
organizing information; 2. ongoing development of new systems
for organizing information—ranging from folksonomies to
formalized ontologies; and 3. a remarkable growth in the
diversity of individuals, well beyond traditional information
professionals, using these systems for organizing information.
One of the most visible places to observe organizing information
change is in the social computing arena (e.g., Flickr, MySpace,
and Connetea), where individuals tag for personal use, but also to
contribute to a community’s collective knowledge structure.
Social computing spans non-academic and academic
communities, and enables content creators and viewers to
represent and organize content via folksonomic tagging and
through indexing by common attributes, such as date, title, and
creator (author). These approaches together can be grouped
under the concept of metadata—structured data about data. Social
computing and personal digital information applications are also
supporting more formalize metadata schemes. For example many
digital cameras and photo-posting sites like Flickr support the
Exchangeable Image File Format (EXIF), a specification for the
image files, and iTunes and Windows Media Player, among other
music software applications, support the ID3 standard for MP3
audio files. (The ID3 acronym that was inspired by the decision
tree algorithm developed by Ross Quinlan, although the acronym
does not have an official translation in the context of MP3
metadata.) The burgeoning digital repository environment for
scholarly output in multiple forms appear to have also cultivated
significant change in organizing information practices, as many
such repositories, Information request or even require that
contributors (generally content creators) depositing their work, to
help curate and create metadata describing their contributions.
The developments noted here feed O’Reilly’s notion of Web 2.0
[1], and exhibit aspects of Library 2.0 [2], such as innovation and
a participatory community.
These developments provoke
important questions about the changes in organizing information
practices and the underlying systems, including the creation and
use of knowledge organization systems (KOS). How might we
explain or make sense of this change? What theoretical constructs
may help us to understand the features and functionalities being
presented in evolving information systems? Exploring the
transformation in organizing information systems can help
information science professionals understand change, and develop
approaches to facilitating effective changes—leading to greater
productivity. Moreover, specific to research, theory is crucial for
identifying significant research questions for advancing our field.
This paper considers these questions via a case study approach
and explores changes in organizing information systems in the
context of evolution and cultural diffusion. The case study is
presented with Dryad (http://www.datadryad.org/)—a digital
repository for data underlying research publications in
evolutionary biology and related fields. The case study interprets
organizing information systems as schemes ranging from
somewhat archanistic Folksonomies to formalized ontological
structures. Information system refers to a framework supporting
the acquisition/ingestion, representation, and search/retrieval of
information objects.
Transformation factors include 1.
technological innovation, 2. the “networked” information
infrastructure, 3. the enormous amount of information people
create and obtain daily, 4. the natural inclination to manage
information so it can be found and effectively used, and 5. a
growing and diverse population of people involved in organizing
information.
The rest of the paper is organized as follows. The next section of
this paper gives an overview on the theories of evolution and
cultural diffusion as construct for studying change. The paper
then introduces the Dryad project, reporting on the project’s status
and goals.
Questions guiding an initial inquiry on the
applicability theories of evolution and cultural diffusion are
presented, and they are explored as a way to understand
organizing information practices within Dryad. The paper
concludes with brief remarks about next steps and the importance
of exploring theory to understand change in organizing
information.
2.
Theories of Evolution and Cultural
Diffusion
Proposing that the theories of evolution and cultural
diffusion may help us understand changes in systems for
organizing information seems a bit lofty, particularly without the
support of empirical data. Despite this observation, there is value
in first articulating that existing theories of “change” may in fact
help us understand the transformation occurring in organizing
information systems, and that observable evidence via a case
study can provide grounding for further research. In other words,
if this case study demonstrates the applicability of these theories,
information science researchers may be encouraged to further
explore these theories and be inspired to identify new and
important research questions germane to our evolving field. This
paper presents a logical first step by defining the theories of
evolution and cultural and demonstrating changes supporting
these theories observed while developing the Dryad repository
2.1 Evolutionary theory
The study of evolution is grounded in biological science
and the idea that all living species have common ancestors. A
simplistic example is family relationships, in which cousins have
a common grandmother and inherit certain genes (e.g., the ability
to have blue eyes). The evolutionary mechanism most familiar to
the average citizen is natural selection—a process whereby a
species inherits favorable characteristics over time for survival,
popularized in Darwin’s Origin of the Species [3]. Adaptation is
another fairly familiar evolutionary concept. People living in
extremely cold climatic regions (Alaska and Siberia) for
thousands of years demonstrate adaptation via developing short
round bodies and short arms and legs, and other characteristics, to
minimize heat loss and endure extreme cold and wind. The
physique of people living in such regions differs from their
ancient ancestors who migrated to more arid and warmer climates.
The study of evolution is grounding in biology; it is an
interdisciplinary field drawing from a wide range of scientific
disciplines (ecology, genomics, paleontology, population genetics,
physiology, systematics, and new biological subdisciplines such
as genomics). Scientific study focuses on phenotype, the
observable characteristics, traits, behaviors, and mapping a
species morphological development; and genotype, the genetic
composition of an organism or cell. Our popular understanding of
evolution, informed by Darwin and noted above, and is grounded
more in phenotype, although not exclusively. Many disciplines
have looked to evolution to develop their work, such as
mathematics and computer science, whereby studying phenomena
and developing genetic algorithms, allows for ruling out
properties and deriving a single optimal solution [4].
A question to consider is can aspects of evolutionary theory
explain changes in organizing information systems? Information
systems are not living entities depending on food for
sustainability, although they are operational systems with a
lifespan. We study and develop information systems to support
the life-cycle of digital information, and researchers have even
defined a metadata life cycle [5]. Information systems connect to
and are often related to other information systems and structures
in the larger information infrastructure. For example, the World
Health WHOSIS (WHO Statistical Information System)
(http://www.who.int/whosis/en/) links to health-related statistical
information found across the globe, and one can query the system
to construct tables for any combination of countries. The linking
and ability to integrate and develop new resources may represent a
form of ancestry. These initial observations, it seems, warrant the
exploration of evolutionary theory in relation to changes in
organizing information systems.
2.2 Cultural Diffusion
Cultural diffusion can be defined as the spread of ideas, material
objects, behaviors (eating, talking, dressing) from one society
another [6]. Cultural diffusion is spurred by cultural domination
or force, direct communication, and indirect means [7]. Cultural
imperialism—forcing behavior, customs, or language of one
culture on another is an example of cultural domination. It’s
difficult to see the applicability of domination for organizing
information systems, although it may be possible to argue that
domination exists if a vendor corners the market. What seems
more probable are direct and indirect paths of cultural diffusion
(among subcultures or communities) fostering change in
organizing information systems.
The phenomenal growth in information technology over the last
30 years has allowed many subcultures to thrive and new
communities to develop. Facebook—a social network web site
developed initially for college students is one example of such a
new community. There is clear evidence of direct diffusion
(communication between information professionals and
educators) as part of the development of many educational
software packages, which, as a result, often include educational
metadata standards and vocabularies for representing educational
resources. Perhaps more a result of indirect diffusion, scientific
communities are developing knowledge representation systems in
the form of sophisticated ontologies for topics such as genetic
sequencing and proteomics. Indirect diffusion is proposed here
because the development of ontologies mimics, to some degree,
the development of thesauri, although often scientific
communities are not connected with the information and library
science community.
Information/library scientists demonstrate a predilection
for studying the spread of a topic and theories of diffusion. For
example Bradford's law [8] provides a measure for examining
how a "subject" is distributed (scattered) in the literature
according to a certain mathematical equation. This law can be
used to study journals covering a discipline over a selected time
period in order to understand the growth of a discipline. This law,
and other bibliometric and informetric measures offer a view of
diffusion. More precisely addressing diffusion, information and
library science researchers have studied Rogers diffusion of
innovation theory [9] in the context of topics such as
collaboration with information systems [10], and the
implementation and adoption of online searching [11]. These
examples support an initial inquiry about cultural diffusion as a
construct for explaining information organization changes, and
motivate us to inquire and more formerly investigate if ideas,
objects, and behaviors representing change in organizing
information, in the context of information/library science practice
can be explained, to any degree, by cultural diffusion.
3. The Dryad Repository
Dryad is a repository for data objects supporting
published research in the field of evolutionary biology and related
disciplines. Formerly known as DRIADE, the Dryad repository
manages the life-cycle of heterogeneous digital data connected
with the publication by facilitating data deposition (acquisition),
preservation, discovery, sharing, use and reuse. The repository
was launched via a collaboration involving the National
Evolutionary Synthesis Center (NESCent) and UNC’s School of
Information and Library Science, Metadata Research Center
<MRC>. Dyad seeks to balance a need for low barriers, inviting
contributions from the wide range of scientists in evolutionary
biology, with higher-level goals supporting computational data
analysis for advancing evolutionary biology.
Organizing information is critical to the success of
Dryad. A chief goal is to organize Dryad’s objects, although
approaches implemented in traditional organizing information
systems (e.g., library catalog and indexes) are not fully suitable.
Repository stakeholders have articulated a need and desire for
search/retrieval capabilities and recognize that the use of some
form of metadata standard and, more significantly, a vocabulary
standard (e.g., a subject thesaurus) to help achieve these desired
functions. Dryad differs from more traditional systems in that
scientists are being required to contribute data objects and create a
minimal representation for their data objects, so that they can be
tightly coupled with the published research they support. Our
goal is to employ automatic techniques as much as possible to
generate metadata. Dryad users are also looking for Web 2.0
functionalities, such as, support of annotation and folksonomies,
personalization of system features to help with querying or
alerting scientists to a new topic (recommender systems), and
automatic processing to achieve syntactic interoperability among
data sets generated with different software [12].
4. Questions about Change
Dryad exemplified change and can help us understand organizing
information change and transformation taking place, taking place,
drawing from traditional information organization practices, and
incorporating newer approaches. DRIADE invites exploration
about change, and serves as a probe for considering the
applicability of theories of change that might help us understand
the transformation of organizing information systems. Questions
guiding this initial inquiry are:

Are aspects of evolutionary theory visible that might
help explain changes in organizing information
systems?

Are aspect of cultural diffusion visible that might
explain changes in organizing information systems?
5. An Approach to Exploring Questions about
Change
Dryad development was initiated in fall 2006, and the repository
has been operation for a little over a year. One of the first steps
was to develop functional requirements. Formal activities have
included an analysis of functionalities found in existing
repositories; a stakeholders’ workshop/roundtable discussion held
in December 2006, which included representatives from the major
journals in the field of evolutionary biology, scientists, and
representatives of major societies in evolutionary biology; and the
recent workshop, “Digital Data Preservation, Sharing, and
Discovery: Challenges for Small Science Communities in the
Digital
Era
(https://www.nescent.org/wg_digitaldata/Public:DRIADE_Works
hop_May_2007), which included stakeholders and experts in
digital data and organizing information. Information culled from
these activities are guiding Dryad’s development and show how
the theories of evolution and cultural diffusion can explain some
of the more novel organizing information practices.
6. Reflecting Theories of Change in Dryad
Dryad displays aspects of the theories of evolution and cultural
diffusion. Within the construct of evolutionary theory, natural
selection, inheritance, and adaptation are applicable, and some
time intermingled in results. Examples are presented here:

Metadata standards and natural selection, inheritance,
and adaptation.
Natural selection intermingled with
adaptation may explain, on some level, the use of metadata
standards. As indicated above, repository stakeholders have
articulated a need and desire for search/retrieval capabilities
and recognize that the use of some form of metadata standard
and a standardized vocabulary (e.g., a subject thesaurus) will
aid help achieve these desired functions. The selection of
these features is one of, possibly, natural selection—in that
they are integral to organizing information.
Dryad’s
functional requirements also include the development of a
metadata application profile inheriting favorable properties
from the Dublin Core, DDI (Data Document Initiative), EML
(Ecology Metadata Language), and the PREMIS
(PREservation Metadata: Implementation Strategies) data
dictionary has been developed to support the repository’s
first phase. The development of an “application profile”
might also be viewed as an adaptation in terms of keeping
useful feature, and developing a newer and more robust
scheme.


Annotation and folksonomies as adaptations. The
development of annotation and folksonomy functionalities,
where users can provide input both in the form of
commentary and tagging, might be explained as adaptations
of information organization practices that were previously
and primarily the responsibility of information professionals.
This activity has become less formalized and less structured,
and is increasingly found in information repositories and
other types of information environment that support input
from users and creators of information, instead of
information professionals.
The concept of a work via natural selection and
adaptation. Although our modern (in the 19/20th century
sense) information systems have not emphasized the work in
ways that we might intellectually understand them, the field
of library and information science has long recognized the
centrality of the idea of a work [13]. The concept of a work
is central to Dryad, and there are two first class objects
represented in Dryad—the publication and the data objects
supporting the publications. The current publication process
requires evolutionary biologists to deposit certain data in
specialized data repositories (e.g., GenBank and TreeBase),
and additional supplementary data in journal repositories
(e.g., American Naturalist and Molecular Biology and
Evolution). Tracking the life-cycle of digital data objects as
they are modified is important for supporting accurate and
intelligent reuse of data stored in Dryad. Dryad’s support of
data object reuse may then be viewed as process of natural
selection. Dryad objects that are modified as a result of data
reuse, and then deposited due to another publication may be
considered adaptations whose life-cycle might be accurately
explained with concepts that form bibliographic relationship
taxonomies [14,15,16] to track the life-cycle of data objects,
contributing to our understanding of instantiation [17].
Similar to observations related to evolutionary, Dryad exhibits
elements of direct and possibly indirect diffusion to explain the
theory of cultural diffusion.


A team of mixed expertise allowing for direct diffusion.
Dryad’s development team includes evolutionary biologists,
computer scientists, and information/library scientists. The
team of people from different disciplines allows for a direct
diffusion of ideas and information about information systems
and behaviors or functionalities. We have found that the
evolutionary biologists on the Dryad team, and those present
at stakeholders meeting, are familiar with the idea of
taxonomy in a scientific classificatory sense, for naming
species, however, they are neither aware nor comfortable
with structured vocabularies underlying
information
systems, such as lexical semantic thesauri, subject heading
lists, and even name-authority files.
Specificity and exhaustivity via direct diffusion. Over
the course of the last year, it has become clear that principled
subject access to Dryad’s data objects is critical for resource
discovery and for promoting data sharing and reuse.
Members of Dryad’s development team are working to
spreading knowledge of the principles of specificity and
exhaustivity via direct diffusion through a user tutorial and
the development of an “ASK-the-metadata expert” system
linking information professionals and scientists.

Moving forward via indirect diffusion. Indirect diffusion
implies there is a something that some entity (e.g., a person,
country, or an object) may stand between two cultures,
although, still, over the course of time, diffusion takes place.
Indirect diffusion may be taking place as evolutionary
biologists, and scientists in general, are becoming more
accustomed to digital repositories and data deposition
practices, as they are required to deposit their supplemental
data in various journal repositories (e.g., again, American
Naturalist and Molecular Biology and Evolution require the
research to submit supplemental data for publications) as part
of the publication process. Dryad may be the natural next
step for scientists in this domain, given their use of various
repositories (e.g., journal repositories, and more specialized
repositories, such as Genbank and TreeBase). This idea and
the other ideas require more analysis, but it seems likely that
indirect diffusion is occurring on multiple levels as scientists,
in Dryad’s case—evolutionary biologists, work with not only
repositories, but general networked technology and
information resources daily.
5. Conclusion
The paper considered theories of evolution and cultural diffusion
for explaining change in organizing information practices. This
paper discussed these theories and their value for studying
change. A case study focused on the Dryad repository permitted
exploration of these theories. This inquiry indicates that the
theories of evolution and cultural diffusion are applicable to
Dryad, a repository that is reflective of change. Within the
construct of evolutionary theory, natural selection, inheritance,
and adaptation are applicable, and some time intermingled.
Similar to observations related to evolutionary, Dryad exhibits
elements of direct and possibly indirect diffusion to explain the
theory of cultural diffusion. The application of these theories
within Dryad may encourage us to further explore these topics,
help us to understand and contextualize change organizing
information practices, and identify new significant research
questions for advancing our field.
2. ACKNOWLEDGMENTS
Our thanks to ACM SIGCHI for allowing us to modify templates
they had developed.
3. REFERENCES
[1] O'Reilly, T. 2005. What Is Web 2.0: Design Patterns and
Business Models for the Next Generation of Software.
O’ReillyNet:
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/3
0/what-is-web-20.html
[2] Miller, P. 2006. Coming Together around Library 2.0: A
Focus for Discussion and a Call to Arms. D-Lib Magazine,
12(4): http://www.dlib.org/dlib/april06/miller/04miller.html.
[11] Marshall, J. G. 1990. Diffusion of Innovation Theory and
End-User Searching. Library and Information Science
Research, v12 n1 p55-69 Jan-Mar
[3] Darwin, C. 1958. On the Origin of Species by Means of
Natural Selection, or, the Preservation of Favoured Races in
the Struggle for Life. London: J. Murray.
[12] Dube, J., Carrier, S., and Greenberg, J. 2007. DRIADE: a
data repository for evolutionary biology. In JCDL2007:
Proceedings of the 2007 conference on Digital libraries.
p.481, ACM Press, Vancouver, BC, Canada
[4] Forrest, S. (1993). Genetic Algorithms: Principles of
Natural Selection Applied to Computation. Science, 261:
872-878.
[5]
Green S. and Kent, J. P. The Metadata Life Cycle,” in
MetaNetWork Package 1: Methodology and Tools, ed. JeanPierre Kent. 2002, 29-34.
[13] Smiraglia, R. P. 2003. The History of “The Work” in the
Modern Catalog. CCQ? 553 – 567
[6] Schaefer, James M., ed. 1974. Studies in cultural diffusion:
Galton's problem: Vol. I, II. Cross-Cultural Research Series.
[14] Smiraglia, R. P., and Leazer, G. H. 1999. Derivative
Bibliographic Control Relationships: The Word
Relationship in a Global Bibliographic Database. Journal of
the American Society for Information Science, 50(6): 493–
504.
[7] Berry, J.W. 1979. Social and Cultural Change. In H.C.
Triandis and R. Brislin (Eds.), Handbook of Cross-Cultural
Psychology, 5: 211-279. Boston: Allyn and Bacon.
[15] Tillett, B. 1991. A Taxonomy of Bibliographic
Relationships. Library Resources & Technical Services,
35(2), 150-158.
[8] Bradford, S. C. 1953. Documentation. 2nd ed. London:
Crosby Lockwood.
[16] Tillett, B. B. 1992. Bibliographic Relationships: An
Empirical Study of the LC Machine-readable Records.
Library Resources & Technical Services, 36(2), 162-88.
[9] Rogers, C. E. 1983. Diffusion of Innovations, 5th. NY:
Simon and Shuster.
[10] Sonnenwald, Diane H. and Maglaughlin, Kelly L. and
Whitton, Mary C. 2001. Using Innovation Diffusion Theory
to Guide Collaboration Technology Evaluation: Work in
Progress. In Proceedings IEEE International Workshop on
Enabling Technologies, pages pp. 114-119.
[17] Smiraglia, R. P. 2005. Instantiation: Toward a Theory. In
Liwen Vaughan (Ed.) Data, Information, and Knowledge in a
Networked World: Proceedings of the Canadian Association
for Information Science Annual Conference, June 2-4 2005:
http://www.cais-acsi.ca/search.asp?year=2005.
Download