Developing a Generic Toolkit: Architecture and technology issues ALLC/ACH Conference 2003 Chris Turner Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Generic Criteria • Achieving Reusability through: – System independence – Standardisation – Availability & cost – Support – Sustainability Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources TVS Model • The TVS model (Carvalho & Cordeiro, 2002) – proposes a structured framework for the exploitation of XML technologies • Transport – Data exchange, transfer between systems • Validation – Structure, semantics, basic data typing • Services – Reuse of search and display functionality Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Web Services • Web Services is an umbrella term for a set of standards which are about communication between separate software programs. They address the process of exchanging data and instructions between different programs. The programs may be resident on different computers, with different operating systems and written in different languages. Web Services use XML as the file format and HTTP as the transport protocol. Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources System Components User PC/Mac/X Web Browser Server – Linux/Unix/Windows LEADERS application LEADERS Toolkit Digitised/Encoded Resources Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Digitised/Encoded Resources EAC XML files – EAD XML file – Contains metadata including index terms regarding the entire collection and resources – •Original documents •Transcripts •Images Descriptive metadata about People, Organisations and families TEI XML files – Transcripts of original documents Image JPEG files – Digitised images of original documents Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources System Components User PC/Mac Web Browser Server – Linux/Unix/Windows LEADERS application LEADERS Toolkit Digitised/Encoded Resources Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Toolkit LEADERS Application WSDL- XML file describing services – utilized by application SOAP XML Messages generated to carry messages between application and toolkit Services, written in Java: Search name/place/topic indexes – return browse list containing nearest match plus four entries above and below in index. Search by name, place, topic, date individually or in combination – return hitlist Search by id – return detailed object display Search Engine XML Index document derived from EAD finding aid by stylesheet Digitised/Encoded Resources: Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Toolkit • Components: – Server environment: Apache Cocoon – Parser, Processors, etc.: • JAXP (Sun), dom4j, Xerces, Xalan, FOP. – Search engine: Lucene – System neutral; Open source; Supported by Apache Software Foundation Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Toolkit • Reusability: – Can be applied to other resources encoded according to the schema rules. – Services can be consumed by multiple applications. – Services may be added or extended. – Can be hosted on Windows/Unix/Linux. Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources System Components User PC/Mac Web Browser Server – Linux/Unix/Windows LEADERS application LEADERS Toolkit Digitised/Encoded Resources Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Application • Hosted by Apache Tomcat • ‘Consumes’ Web Services • Components – Java server pages – Stylesheets – Cascading Stylesheets Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Application • Reusability – Same application can be used to access different resources served up by the Web Services. – New applications can be created to consume services accessing the same or different resources e.g.: • Integration with an educational application • Study of palaeography • Focus on biographical or authority information • On-line ordering of images/offprints/document production Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Summary • Reusability at all levels: – Resource files may be re-used as stand alone files and/or as components in one or more LEADERS applications. – Toolkit can support multiple resource sets and multiple applications in multiple environments. – Applications can access multiple resource sets in multiple environments. Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources LEADERS Demo application – An instance of a LEADERS Application. – Concentration on the detailed presentation of resources, rather than on search interface. – Built for the purpose of gaining user feedback. – Example screen shots of demonstrator application. Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources