VuFind Beyond MARC discovering everything else Demian Katz VuFind Developer demian.katz@villanova.edu How VuFind Used to Work • MARC records were loaded into Solr. – Data parsed to fields for searching/faceting. – Full binary record stored in “fullrecord” field. • Solr was used for retrieving records. • VuFind’s PHP code made heavy use of “fullrecord” data for building displays. What’s wrong with that? •MARC must die. • Not all searchable documents are MARC. • Code for pulling data from MARC is ugly. Redesign Goals • Centralize MARC-specific code so it can be easily replaced. • Use stored Solr fields whenever possible. • Allow arbitrary metadata formats to coexist peacefully. • Make no assumptions about metadata content. The Solution: Record Drivers • A class interface for displaying a document retrieved from Solr. • A new Solr field tells VuFind which Record Driver to instantiate for each document. • A default Record Driver can be written to display a document based solely on stored Solr fields. One Key Design Decision • What should the Record Driver class contain? – Data-oriented methods (getTitle, getAuthor, etc.) – Screen-oriented methods (getSearchResult, getStaffView, etc.) The Answer: All of the Above interface RecordInterface public getSearchResult() public getStaffView() class IndexRecord … implements RecordInterface protected getAuthor() protected getTitle() … class MarcRecord extends IndexRecord protected getAuthor() protected getTitle() … Record Driver Benefits • • • • Large-scale changes are possible. Small-scale changes are easy. Allows object-specific behaviors. Eases maintenance of local customizations. Next Problem… • Where’s the data? • MARC records traditionally come from an ILS export. • SolrMarc traditionally takes care of populating VuFind’s Solr index. Growing the Toolkit • The toolkit approach is important! • Problems to solve: – Obtain records from remote sources – Process harvested files – Index arbitrary XML Tool #1: OAI-PMH Harvester • Purpose of tool: harvest metadata files from an OAI-PMH server into a directory. • Key feature: ID manipulation. • Key feature: delete support. Tool #2: Batch Import Scripts • Purpose of tool: process all metadata files in a directory. • Easily achieved with Windows batch or Unix shell scripting. • Several sample scripts ship with VuFind. Tool #3: XSLT Importer • Purpose of tool: with XSLT, map an XML document to a Solr document based on VuFind’s schema. • Key feature: PHP integration • Key feature: Aperture support • Several sample XSLT documents ship with VuFind (DSpace, OJS, VuDL). Parting Thoughts • Understanding Record Drivers gives you a lot of control over VuFind. • VuFind should be able to index practically anything with a bit of effort. • Don’t be afraid to build your own tools! More Information • VuFind: – http://vufind.org • Demian Katz: – demian.katz@villanova.edu