UCSB Campus Informatics: Collaboration for Knowledge Management

advertisement
Faculty Research Data:
Informatics and Archiving
Sarah M. Pritchard
University Librarian
University of California, Santa Barbara
ECURE 2005, Phoenix, AZ
Informatics: A Definition
 The study of the structure and behavior of natural
and artificial systems designed to process data
 Development of tools to ingest and interpret large
stores of data in heterogeneous and distributed
systems
 Integration of data (numeric, textual, image,
spatial) with tools for modeling, trend analysis,
mapping, image processing, etc.
 Business applications not studied in this context
March 1, 2005
ECURE 2005, Phoenix, AZ
2
Informatics at UCSB
 Emergence of informatics as a specialty in several
academic departments, notably environmental
sciences
 Highly interdisciplinary faculty
 Development of unique stand-alone systems for
managing collaborative research data
 No ongoing mechanisms for communication and
technical coordination
 Campus and consortial projects emerging for
digital publications and for instructional support
but not yet for research data
March 1, 2005
ECURE 2005, Phoenix, AZ
3
Faculty Research Data
 Large numeric data sets from physical
sciences and laboratory research
 Imaging – geosciences, neurosciences
 Fieldwork – environmental, archaeological
 Customized interpretive and manipulation
tools
 Drafts, correspondence, notes
March 1, 2005
ECURE 2005, Phoenix, AZ
4
UCSB Computing Environment
 One of the original nodes of the Internet
 No centralized academic computing organization
 Offices for networking, and for instructional
support
 Individual colleges and departments have
developed own servers and support for research
data and teaching tools
 High-level campus policy board for IT issues brings
some coordination
March 1, 2005
ECURE 2005, Phoenix, AZ
5
UCSB Library Context
 Alexandria Digital Library
(www.alexandria.ucsb.edu)





Extension into new disciplinary applications
Heterogeneous metadata ingest
Extensive backup and archiving architecture
Long record of faculty collaboration
NDIIPP
 California Digital Library (www.cdlib.org)



Digital preservation initiatives for published documents and for
(under development) government information web sites
eScholarship program to support publication of online journals,
preprint archives
Online Archive of California – special collections support
 Other faculty support


Electronic reserves including streaming audio reserves
Digital document delivery to the desktop
March 1, 2005
ECURE 2005, Phoenix, AZ
6
What questions emerge from this?
 Why are faculty building informatics systems?
 Is valuable research time and funding being spent
on tangential work?
 Are there commonalities across informatics
applications and disciplines?
 Is there redundancy in tool development?
 Can data be openly accessed or shared?
 Are digital library concerns (metadata, IP rights,
archiving) incorporated?
March 1, 2005
ECURE 2005, Phoenix, AZ
7
Informatics Project Goals
 Create stronger linkages among relevant faculty
research projects
 Identify components and needs in informatics and
the management of research data
 Assess the degree of commonality in informatics
tools and functionality
 Determine whether more support is needed for
data archiving, metadata, interfaces, IP
 Develop a planning agenda for informatics in a
distributed environment
 Inform the design of facilities and services
March 1, 2005
ECURE 2005, Phoenix, AZ
8
Project Components
 Background research in current informatics work
in academic disciplines
 Structured interviews and site visits with selected
faculty
 Matrix of system characteristics and issues
 Informal roundtables for faculty working in these
areas
 Collaboration with related IT units
 White paper for campus discussion of futures
March 1, 2005
ECURE 2005, Phoenix, AZ
9
UCSB Informatics: Participants
 Faculty chosen on the basis of




Innovative science
Data intensive work
Interdisciplinary research
Recommended by the Office of Research, colleagues,
department heads, IT offices and librarians.
 Control Group: Non-science faculty

Select group of technologically innovative faculty in other
disciplines were used as a control to determine whether
trends were specific to sciences
 About 40 people interviewed
March 1, 2005
ECURE 2005, Phoenix, AZ
10
Sample Questions for Faculty

How do you store research information?

Do you do any cataloging, indexing, or metadata?

How are your data maintained on an on-going basis?

Is there something special about the way that you manage
your data compared to colleagues within the field?

Do you write or borrow scripts/tools? For what purpose?

Are you having difficulty managing your data collection? Are
there services that you wish others would provide?

How is IP and sharing of datasets/information handled in your
field?

When you collaborate with others through the web what kinds
of tools, if any, do you use?

What are your plans for this research in the next five years?
Are there service requirements that you will need then?
March 1, 2005
ECURE 2005, Phoenix, AZ
11
Findings: Growth of Systems

The sophistication of informatics arrangements is determined
by the amount of data collected and how labor-intensive it is to
collect.

Change happens when the following converge:
 Data size increases exponentially
 Research questions encompass broad range of specialties
 Funding agencies require change for funding

Guiding principles seem to be:
 “What is the smallest group of people that I can have do
the work, and still do the [work]”
 “What is the least amount of indirect work [e.g.,
informatics] related to the research that I can do, and still
do the [work]”
March 1, 2005
ECURE 2005, Phoenix, AZ
12
Findings: Data Preservation
Perceived Preservation Need
Perceived Long-term Preservation Need of Faculty and Staff Researchers
Critical Need
16%
Future Need
3%
Some Need
50%
Impact Unknown
31%
March 1, 2005
ECURE 2005, Phoenix, AZ
13
Findings: Data Preservation

Some science fields have national and international data
centers where data deposit is required for grant funding.

Where data centers do not exist, backup depends on:



Length of a grant
Length of time primary researcher on campus
Perception that data has maximum value for 12-18 months after
publication, and negligible value after 5-10 years.

Departments lack personnel and support for long-term
preservation of data.

Faculty store data on the “removable media of the day” and
forget about it, until it becomes difficult or impossible to access

More complex systems, same number of people to manage
them, leads to less time to devote to “meta-issues”

Critical impact: research collaboration and long term historical
data analysis suffer
March 1, 2005
ECURE 2005, Phoenix, AZ
14
Data Preservation Practices
Contribute to a nongovernmental portal
14%
Contribute National
Supercomputer
Center
3%
Archive data on
CD/DVD
12%
Relies on back-up
(short term storage)
20%
Stores on multiple
machines
22%
Varies w ith project
3%
No strategy
5%
March 1, 2005
Contribute to
Agency Data Center
or Portal
7%
Run & Maintain
Portal
14%
ECURE 2005, Phoenix, AZ
15
Findings: Data Organization







Most common organizing mechanism – directory structure,
spreadsheets, and word processing software
Databases (with or without metadata) are uncommon. Viewed
as time/labor-intensive, unnecessary drain on research time.
Portals built by tech specialists within a field are well utilized.
Storage space is adequate for now. Over half the people
contacted were in the process of upgrading.
Most departments did not have strictly enforced limits on email,
data storage, and personal storage
Though much on their servers is “garbage,” memory is thrown
at the problem; little support in most departments for data
management
“Not a solved problem.” While actual memory might be cheap,
tape, labor, and other equipment to ensure that data are
maintained is NOT.
March 1, 2005
ECURE 2005, Phoenix, AZ
16
Findings: Metadata issues
 Metadata is discipline specific; commonalities
exist, but key requirements of a discipline vary.
 Metadata structures and subject taxonomies
reflect the way faculty in a discipline think
 While organizational structure is an important
issue in metadata use, other considerations are:



Services available in one’s discipline
Acceptance and standardization in the discipline
Usage in key portals, data centers, and repositories
 One worldwide metadata format is not likely at
this time
 Interdisciplinary metadata issues and crosswalks
March 1, 2005
ECURE 2005, Phoenix, AZ
17
Metadata Usage
Assisted in
development of
metadata
5%
On campus usage
only
19%
Used in select
projects.
11%
Rarely used.
38%
March 1, 2005
ECURE 2005, Phoenix, AZ
Consistent use at data
centers/portals
27%
18
Findings: Intellectual Property
 Intellectual property protocols that faculty follow
after creating software, portals or databases are
highly correlated to the discipline.


In disciplines where things move quickly, the ideal method
is to open source one’s tool to obtain an audience, then
later align oneself with a company, or start one;
In disciplines where there is a lot of money there is
pressure to ensure patents are filed.
 Databases, portals and data centers on campus
typically all have legal waiver forms, allowing
release of the data sets to other researchers as
part of the process to ingest the data.
 Disciplines vary in the extent to which they
support an ethic of data sharing.
March 1, 2005
ECURE 2005, Phoenix, AZ
19
Digital Rights Management Practices
Have not yet encountered
issues, 8%
Prefer to create open source
products to avoid intellectual
property issues, 22%
Intellectual property issues
affect my research
significantly, 30%
Occasional minor issues with
an individual collaborator or
publisher, 24%
March 1, 2005
ECURE 2005, Phoenix, AZ
Practices and Procedures in
industry are well tested and
accepted - no major issues,
16%
20
Findings: Data Support Needs
 Some needs and services were mentioned across
disciplines regardless of current arrangements:



Informatics “point person” or clearinghouse for
information on tools, expertise, and research
knowledge on campus and nationally
Long term archiving of research data especially
during the gap in coverage between publication and
obsolescence
Tiered support services for database development,
cataloging, conversion, emulation, migration, web
development, metadata, pre-planning for technology
grants
March 1, 2005
ECURE 2005, Phoenix, AZ
21
Trends Shaping Future Demand
 Growth in complex data objects
 Improved data mining
 Policies of funding agencies






National repositories
New cyberinfrastructure initiatives
Prevalence of campus repositories for text
Tech-intensive academic programs
Need for rapid and global data exchange
Steady or decreasing staffing
March 1, 2005
ECURE 2005, Phoenix, AZ
22
Key System Characteristics
 Flexibility to customize control, interfaces and
security
 Secure access worldwide
 Metadata-agnostic design
 Interoperability with scholarly
communication, archiving and rights
management systems
 Clearinghouse functions
 Advanced services for migration, emulation,
long-term digital archiving
March 1, 2005
ECURE 2005, Phoenix, AZ
23
Topics for Campus Discussion
 Where are the gaps in current offerings?
 How do technology services on campus
interact, and are new organizational models
needed?
 What are faculty priorities for various services?
 What kinds of research data should be high
priority for preservation, and how much is at
risk?
 What are incentives for faculty participation?
 What is the impact of tenure and promotion
structures in encouraging “data maintenance
work?”
March 1, 2005
ECURE 2005, Phoenix, AZ
24
Possible outcomes
 Everything stays as is
 More peer-to-peer sharing of resources and
expertise
 Policies are established



Intellectual property rights at several levels
Use of metadata and digital object standards
Ensure data sustainability
 Organizational approaches are considered

IT offices, the library, consortial systems support, disciplinary
groups, or a combination
 New services are offered





Database design
Metadata creation
Consulting
Clearinghouse functions
Full digital archiving and migration
March 1, 2005
ECURE 2005, Phoenix, AZ
25
Further Information
 UCSB Informatics Project web site:
http://www.library.ucsb.edu/informatics/
 ECAR Research Bulletin, vol. 2005, Issue 2:
“Informatics and Knowledge Management for
Faculty Research Data,” Jan. 18, 2005
Contact:
 Sarah M. Pritchard, University Librarian
pritchard@library.ucsb.edu
 Larry Carver, Director of Library Technologies and
Digital Initiatives, carver@library.ucsb.edu
•
Special thanks to Smiti Anand, Project Analyst
March 1, 2005
ECURE 2005, Phoenix, AZ
26
Download