Innovations for Botanical Collections

advertisement
Project Title: Deploying CollectionSpace for the UC Botanical Garden: Innovations for
Botanical Collections
Submitter: David Greenbaum, Director, Research Information Technologies (UCB IST-RIT),
dag@berkeley.edu, 510-642-9025
Project Leader: Chris Hoffman and Patrick Schmitz
Team Members: Amy Wieliczka, Aron Roberts, Glen Jackson, John Lowe, Lam Voong, Ray
Lee, Richard Millet, Rick Jaffe, Tom Schnetlage, Yuteh Cheng (IST-RIT). Holly Forbes and
Barbara Keller (UC Botanical Garden).
Project description:
On April 29, 2013, the UC Botanical Garden began using CollectionSpace as its production
collection management system, marking the culmination of a project that is notable for its
innovations, its alignment with university goals to reduce costs and streamline operations, and
the close collaboration between staff in the UC Botanical Garden and IST.
The UC Botanical Garden is a non-profit research garden and museum for the University of
California at Berkeley, having a notably diverse plant collection including many rare and
endangered plants. Established in 1890, the Garden, which is open to the public year round, has
over 13,000 different kinds of plants from around the world, cultivated by region in naturalistic
landscapes over its 34 acres. As with other museums and collections, the curators, registrars,
scientists and students who manage and steward the vital assets in their collection use a software
application called a collection management system. In addition to managing information about
the collection objects themselves (such as their name, description and other metadata about the
object), a collection management system is used to track business processes related to the objects
(e.g., research visits and acquisitions), handle images of the objects, and manage the controlled
vocabularies and hierarchical authorities (e.g., people, organizations, and scientific taxonomic
names) used to standardize information across the objects and business processes.
For the past two decades, a UC Berkeley team in IST-Research Information Technologies has
developed and maintained collection management systems for nine museums collecting a wide
range of materials, from Art to Zoology. However, too many software systems were in use, each
with different technology requirements and skill sets, leading to a set of risks and legacy
technology problems. In 2010, CollectionSpace was selected as the strategic platform for UC
Berkeley collection management systems. The deployment for the UC Botanical Garden is the
third collection migrated into CollectionSpace, following deployments for the University and
Jepson Herbaria, and the Phoebe A. Hearst Museum of Anthropology.
CollectionSpace itself is innovative software that provides an enterprise-ready platform for
collections management. With initial funding from the Andrew W. Mellon Foundation,
CollectionSpace was designed so that it can be customized to meet the needs of different kinds of
collections. The level of extensibility allowed includes the information model, business process
support, integration, and of course branding and styling. Furthermore, CollectionSpace is built
for the modern data center and supports multi-tenancy. CollectionSpace is built using open
1
source software (see Technology Utilized below), and is open source itself (ECL-2.0 license).
CollectionSpace provides RESTful APIs for most functionality.
Taking maximum advantage of the design features of CollectionSpace, the UC Berkeley
deployment team made many customizations and built several capabilities to meet the
requirements of the UC Botanical Garden. Botanical collections are interesting in that they are
collections of living things. The team had to build fields and workflows that supported tracking
the different places that plants were planted in the garden and information about their status
(such as when the plant died). The team added two procedures to support business processes for
plant propagation (e.g., the techniques and treatments applied to grow something from a seed)
and the printing of pot tags (the little stakes one sees in potted plants at a garden store). The
local deployment team also extended existing CollectionSpace functionality, including new
development related to batch processing and event handling in order to support complex business
logic at the Garden. For several years, the UC Botanical Garden has used a business intelligence
tool (Business Objects) for reporting and data analysis. In order to replace this reporting system,
the team built numerous reports leveraging iReport, the open source reporting tool incorporated
by the CollectionSpace project. A set of Postgres functions helps provide data for these reports.
In addition, web applications were built that allow Garden staff to perform hierarchical searches
across dimensions of the Botanical Garden collections data (e.g., "show me all plants in the
genus Rhododendron located in beds 219 through 230"). More details about the report and web
application development are available at
http://wiki.collectionspace.org/display/deploy/UCBG+reports.
One of the most challenging aspects of launching a transactional system like this is the migration
of data from the legacy system into the new one. Again the UCB deployment team took full
advantage of the advanced capabilities built into CollectionSpace, including the RESTful API's
and the CollectionSpace import service. Locally, the team had developed significant experience
and tools for the two deployment projects that preceded this one. In conversation with staff in
the Botanical Garden, significant data cleaning was performed that will have a lasting impact on
the Garden's operations. The team developed SQL queries to extract data from the legacy system
and incorporated those queries into ETL (extract, transform, and load) jobs written in the open
source Talend Open Studio platform in order to create the data payloads that were then imported
into CollectionSpace. Numerous scripts were written and shared amongst several data
developers to streamline this ETL work. Documentation related to data migration is available at
http://wiki.collectionspace.org/display/deploy/UCBGCollectionSpace+data+mapping%2C+v2.4.
The project management methods and tools used in this deployment were based on best practices
and experience from the two earlier deployments and from the CollectionSpace project itself.
Key to this effort was close collaboration with staff in the Botanical Garden. At all stages of the
project, they were involved in requirements gathering, documentation, data analysis, design, and
testing. Openness was a key principle, with all information documented in a project wiki
(http://wiki.collectionspace.org/display/deploy/UC+Botanical+Garden) and Jira. Software for all
UC Berkeley deployment projects resides in github where CollectionSpace code is available as
well (https://github.com/cspace-deployment).
2
With their migration to CollectionSpace, the UC Botanical Garden can now steward their
collection in a robust collection management system that is stable, secure, and efficient.
Importantly, it is ready for the future and can grow thanks to the modularity of CollectionSpace.
In addition, they now have a much more usable, intuitive, and accessible system, one that is
based on web standards and professional user-centered design principles and practices. Their
former system was a client-server system (based on the X Window System) with screen and field
behaviors that required a very high learning curve.
From the perspective of the service provider, UC Berkeley's Information Services and
Technology division now has migrated three collection management systems to the new
platform, one selected for its ability to drive down costs and reduce risk while supporting the
excellence of these research collections. We have successfully demonstrated that
CollectionSpace can be customized for very different kinds of collections and have extended
built-in capabilities such as the RESTful API, batch processing and event handling. A public
portal to the collection is being designed, an important step toward improving access to the
collection. The stability of the collection management system and the openness of the system in
general significantly enhance the research competitiveness of the Botanical Garden. More
broadly, the team is now looking at opportunities to leverage information in the other
CollectionSpace instances in order to drive discovery across different kinds of collections. With
CollectionSpace in use for both the University and Jepson Herbaria and the Phoebe A. Hearst
Museum of Anthropology, there are significant grant funding opportunities that are under
discussion.
Measures of success for the CollectionSpace deployment project cover a range of areas related to
data, system functionality, log analysis, customer satisfaction, and bug reports. Imported data
were tested multiple times using both automated methods and extensive data testing by users and
use cases. CollectionSpace logs were analyzed, and the problem rate is lower than for earlier
deployments. Similarly, user reports of problems have been significantly lower than with earlier
deployments, and the time needed to address those problems has been short (one hour to two
days).
The UC Botanical Garden is a treasure for California and indeed the country. Students, faculty,
researchers around the world, and the general public visit their collections. Our project team has
been proud to help position them for the present and beyond.
Technology Utilized in the Project:
CollectionSpace is based on open source web technologies: Postgres, tomcat, apache, java,
javascript, CSS, and HTML. The open source enterprise content management platform, Nuxeo,
provides a middle layer of services and functionality (e.g., document handling and schema
extension). CollectionSpace runs on Linux, Windows, and Macintosh. All UCB deployments
run on Linux virtual machines in the campus data center, and deployment to our university IT
infrastructure has been smooth. The CollectionSpace project is now building a set of tools to
facilitate SaaS and cloud hosting.
Timeframe of the Implementation:
3
Serious analysis and project planning began in October 2012. The deployment was launched
April 29, 2013.
Objective Customer Satisfaction Data:
From Holly Forbes, UC Botanical Garden Senior Museum Scientist
"The UC Botanical Garden is pleased to have launched CollectionSpace in late April 2013. The
extensive system developed for the Garden in large part by our colleagues in IS&T have done a
remarkable job to make complicated sets of data relate to each other. We are very impressed with
the extent of our positive interactions with and responsiveness from the programming team. They
really worked hard to understand our data processing needs and work flow and came up with
great solutions, including external software for reports. Part of these great interactions included
training on the system and discovering how the system could be improved based on observations
of our work flow in the new system. They really went the extra mile to incorporate our external
systems of nursery and plant sale pot tag creation."
4
Download