Project Title: Deploying CollectionSpace for the UC Botanical Garden: Innovations for Botanical Collections Submitter: David Greenbaum, Director, Research Information Technologies (UCB IST-RIT), dag@berkeley.edu, 510-642-9025 Project Leader: Chris Hoffman and Patrick Schmitz Team Members: Amy Wieliczka, Aron Roberts, Glen Jackson, John Lowe, Lam Voong, Ray Lee, Richard Millet, Rick Jaffe, Tom Schnetlage, Yuteh Cheng (IST-RIT). Holly Forbes and Barbara Keller (UC Botanical Garden). Project description: On April 29, 2013, the UC Botanical Garden began using CollectionSpace as its production collection management system, marking the culmination of a project that is notable for its innovations, its alignment with university goals to reduce costs and streamline operations, and the close collaboration between staff in the UC Botanical Garden and IST. The UC Botanical Garden is a non-profit research garden and museum for the University of California at Berkeley, having a notably diverse plant collection including many rare and endangered plants. Established in 1890, the Garden, which is open to the public year round, has over 13,000 different kinds of plants from around the world, cultivated by region in naturalistic landscapes over its 34 acres. As with other museums and collections, the curators, registrars, scientists and students who manage and steward the vital assets in their collection use a software application called a collection management system. In addition to managing information about the collection objects themselves (such as their name, description and other metadata about the object), a collection management system is used to track business processes related to the objects (e.g., research visits and acquisitions), handle images of the objects, and manage the controlled vocabularies and hierarchical authorities (e.g., people, organizations, and scientific taxonomic names) used to standardize information across the objects and business processes. For the past two decades, a UC Berkeley team in IST-Research Information Technologies has developed and maintained collection management systems for nine museums collecting a wide range of materials, from Art to Zoology. However, too many software systems were in use, each with different technology requirements and skill sets, leading to a set of risks and legacy technology problems. In 2010, CollectionSpace was selected as the strategic platform for UC Berkeley collection management systems. The deployment for the UC Botanical Garden is the third collection migrated into CollectionSpace, following deployments for the University and Jepson Herbaria, and the Phoebe A. Hearst Museum of Anthropology. CollectionSpace itself is innovative software that provides an enterprise-ready platform for collections management. With initial funding from the Andrew W. Mellon Foundation, CollectionSpace was designed so that it can be customized to meet the needs of different kinds of collections. The level of extensibility allowed includes the information model, business process support, integration, and of course branding and styling. Furthermore, CollectionSpace is built for the modern data center and supports multi-tenancy. CollectionSpace is built using open 1 source software (see Technology Utilized below), and is open source itself (ECL-2.0 license). CollectionSpace provides RESTful APIs for most functionality. Taking maximum advantage of the design features of CollectionSpace, the UC Berkeley deployment team made many customizations and built several capabilities to meet the requirements of the UC Botanical Garden. Botanical collections are interesting in that they are collections of living things. The team had to build fields and workflows that supported tracking the different places that plants were planted in the garden and information about their status (such as when the plant died). The team added two procedures to support business processes for plant propagation (e.g., the techniques and treatments applied to grow something from a seed) and the printing of pot tags (the little stakes one sees in potted plants at a garden store). The local deployment team also extended existing CollectionSpace functionality, including new development related to batch processing and event handling in order to support complex business logic at the Garden. For several years, the UC Botanical Garden has used a business intelligence tool (Business Objects) for reporting and data analysis. In order to replace this reporting system, the team built numerous reports leveraging iReport, the open source reporting tool incorporated by the CollectionSpace project. A set of Postgres functions helps provide data for these reports. In addition, web applications were built that allow Garden staff to perform hierarchical searches across dimensions of the Botanical Garden collections data (e.g., "show me all plants in the genus Rhododendron located in beds 219 through 230"). More details about the report and web application development are available at http://wiki.collectionspace.org/display/deploy/UCBG+reports. One of the most challenging aspects of launching a transactional system like this is the migration of data from the legacy system into the new one. Again the UCB deployment team took full advantage of the advanced capabilities built into CollectionSpace, including the RESTful API's and the CollectionSpace import service. Locally, the team had developed significant experience and tools for the two deployment projects that preceded this one. In conversation with staff in the Botanical Garden, significant data cleaning was performed that will have a lasting impact on the Garden's operations. The team developed SQL queries to extract data from the legacy system and incorporated those queries into ETL (extract, transform, and load) jobs written in the open source Talend Open Studio platform in order to create the data payloads that were then imported into CollectionSpace. Numerous scripts were written and shared amongst several data developers to streamline this ETL work. Documentation related to data migration is available at http://wiki.collectionspace.org/display/deploy/UCBGCollectionSpace+data+mapping%2C+v2.4. The project management methods and tools used in this deployment were based on best practices and experience from the two earlier deployments and from the CollectionSpace project itself. Key to this effort was close collaboration with staff in the Botanical Garden. At all stages of the project, they were involved in requirements gathering, documentation, data analysis, design, and testing. Openness was a key principle, with all information documented in a project wiki (http://wiki.collectionspace.org/display/deploy/UC+Botanical+Garden) and Jira. Software for all UC Berkeley deployment projects resides in github where CollectionSpace code is available as well (https://github.com/cspace-deployment). 2 With their migration to CollectionSpace, the UC Botanical Garden can now steward their collection in a robust collection management system that is stable, secure, and efficient. Importantly, it is ready for the future and can grow thanks to the modularity of CollectionSpace. In addition, they now have a much more usable, intuitive, and accessible system, one that is based on web standards and professional user-centered design principles and practices. Their former system was a client-server system (based on the X Window System) with screen and field behaviors that required a very high learning curve. From the perspective of the service provider, UC Berkeley's Information Services and Technology division now has migrated three collection management systems to the new platform, one selected for its ability to drive down costs and reduce risk while supporting the excellence of these research collections. We have successfully demonstrated that CollectionSpace can be customized for very different kinds of collections and have extended built-in capabilities such as the RESTful API, batch processing and event handling. A public portal to the collection is being designed, an important step toward improving access to the collection. The stability of the collection management system and the openness of the system in general significantly enhance the research competitiveness of the Botanical Garden. More broadly, the team is now looking at opportunities to leverage information in the other CollectionSpace instances in order to drive discovery across different kinds of collections. With CollectionSpace in use for both the University and Jepson Herbaria and the Phoebe A. Hearst Museum of Anthropology, there are significant grant funding opportunities that are under discussion. Measures of success for the CollectionSpace deployment project cover a range of areas related to data, system functionality, log analysis, customer satisfaction, and bug reports. Imported data were tested multiple times using both automated methods and extensive data testing by users and use cases. CollectionSpace logs were analyzed, and the problem rate is lower than for earlier deployments. Similarly, user reports of problems have been significantly lower than with earlier deployments, and the time needed to address those problems has been short (one hour to two days). The UC Botanical Garden is a treasure for California and indeed the country. Students, faculty, researchers around the world, and the general public visit their collections. Our project team has been proud to help position them for the present and beyond. Technology Utilized in the Project: CollectionSpace is based on open source web technologies: Postgres, tomcat, apache, java, javascript, CSS, and HTML. The open source enterprise content management platform, Nuxeo, provides a middle layer of services and functionality (e.g., document handling and schema extension). CollectionSpace runs on Linux, Windows, and Macintosh. All UCB deployments run on Linux virtual machines in the campus data center, and deployment to our university IT infrastructure has been smooth. The CollectionSpace project is now building a set of tools to facilitate SaaS and cloud hosting. Timeframe of the Implementation: 3 Serious analysis and project planning began in October 2012. The deployment was launched April 29, 2013. Objective Customer Satisfaction Data: From Holly Forbes, UC Botanical Garden Senior Museum Scientist "The UC Botanical Garden is pleased to have launched CollectionSpace in late April 2013. The extensive system developed for the Garden in large part by our colleagues in IS&T have done a remarkable job to make complicated sets of data relate to each other. We are very impressed with the extent of our positive interactions with and responsiveness from the programming team. They really worked hard to understand our data processing needs and work flow and came up with great solutions, including external software for reports. Part of these great interactions included training on the system and discovering how the system could be improved based on observations of our work flow in the new system. They really went the extra mile to incorporate our external systems of nursery and plant sale pot tag creation." 4