Pedro 2.0 Design Document written by Kevin Garwood edited and formatted by Chris Garwood Table of Contents 1 Introduction ....................................................................................................................................... 7 2 The Pedro Project Vision .................................................................................................................. 8 2.1 The Tools ................................................................................................................................... 8 2.1.1 Pedro ................................................................................................................................... 8 2.1.2 Pierre ................................................................................................................................... 9 2.2 Shared Aspects of the Tools..................................................................................................... 13 2.3 Using the Tools in Concert ...................................................................................................... 15 3 Project Environment ....................................................................................................................... 16 3.1 Project History ......................................................................................................................... 16 3.2 Design Forces ........................................................................................................................... 20 3.3.1 Towards a Model Driven Approach.................................................................................. 20 3.3.2.1 Constraining the Task ................................................................................................ 22 3.3.2.2 Constraining the Front End ........................................................................................ 23 3.3.2.2 Constraining the Back End ........................................................................................ 25 3.3.2.2 Constraints on Programming Languages and Generation of Applications ................ 26 4 Pedro Architecture .......................................................................................................................... 27 4.1 Model-Driven Aspects ............................................................................................................. 28 4.2 Service Based Aspects ............................................................................................................. 29 4.2.1 Service Anatomy ............................................................................................................... 29 4.2.1.1 Task ............................................................................................................................ 29 4.2.1.2 Scope of Effect ........................................................................................................... 29 4.2.1.3 Persistence .................................................................................................................. 29 4.2.2 Access to Application Variables ....................................................................................... 29 4.2.3 Service Types .................................................................................................................... 30 4.2.3.1 General Services ........................................................................................................ 30 4.2.3.2 Specialised Services ................................................................................................... 30 4.2.3.2.1 Validation Services ............................................................................................. 30 4.2.3.2.2 Ontology Services ............................................................................................... 30 4.2.3.2.3 ID Generator Service .......................................................................................... 31 5 Description of Subsystems .............................................................................................................. 32 5.1 Schema Reader ......................................................................................................................... 32 5.1.1 Purpose .............................................................................................................................. 32 5.1.2 Description ........................................................................................................................ 32 5.1.3 Design History .................................................................................................................. 32 5.1.3 Scope of Effect .................................................................................................................. 34 5.1.4 Relevant Code Packages ................................................................................................... 34 5.2 Native Data Structures ............................................................................................................. 35 5.2.1 Description ........................................................................................................................ 35 5.2.1.1 RecordModel Properties ............................................................................................ 37 5.2.1.2 DataFieldModel Properties ........................................................................................ 37 5.2.1.3 EditFieldModel Properties ......................................................................................... 39 5.2.1.4 IDFieldModel Properties............................................................................................ 40 5.2.1.5 GroupFieldModel Properties...................................................................................... 40 5.2.1.6 BooleanFieldModel Properties .................................................................................. 40 5.2.1.7 ListFieldModel Properties.......................................................................................... 41 5.2.2 Design History .................................................................................................................. 41 5.2.3 Scope of Effect .................................................................................................................. 42 5.2.4 Relevant Code Packages ................................................................................................... 42 5.3 Pedro Contexts ......................................................................................................................... 43 5.3.1 Purpose .............................................................................................................................. 43 5.3.2 Description ........................................................................................................................ 43 5.3.3 Design History .................................................................................................................. 44 5.3.4 Scope of Effect .................................................................................................................. 44 5.3.5 Relevant Code Packages ................................................................................................... 44 5.4 Validation Services .................................................................................................................. 45 5.4.1 Purpose .............................................................................................................................. 45 5.4.2 Description ........................................................................................................................ 45 5.4.3 Design History .................................................................................................................. 48 5.4.4 Scope of Effect .................................................................................................................. 48 5.4.5 Relevant Code Packages ................................................................................................... 48 5.5 Ontology Services .................................................................................................................... 49 5.5.1 Purpose .............................................................................................................................. 49 5.5.2 Description ........................................................................................................................ 49 5.5.2.1 Basic Data Structure: Ontology Term ........................................................................ 49 5.5.2.2 Ontology Provenance ................................................................................................. 49 5.5.2.3 Ontology Services ...................................................................................................... 50 5.5.2.4 Ontology Source ........................................................................................................ 51 5.5.2.5 Ontology Viewer ........................................................................................................ 53 5.5.2.6 Default Viewer’s Use of Introspection on Ontology Sources.................................... 54 5.5.2.7 The OntologyContext Object ..................................................................................... 55 5.5.2.8 A Walkthrough for Selecting an Ontology Term....................................................... 55 5.5.3 Design History .................................................................................................................. 56 5.5.3.1 Decoupling Controlled Vocabularies from Data Models .......................................... 56 5.5.3.2 Support for Stub Ontologies for Rapid Prototyping .................................................. 57 5.5.3.3 Basing the Framework on Identifiers Instead of Word Phrases................................. 57 5.5.3.4 Supporting Multiple Formalisms ............................................................................... 58 5.5.3.4 Decoupling Aspects of Model and View in an Ontology Service ............................. 59 5.5.3.5 Consider Local and Remote Ontology Sources ......................................................... 59 5.5.3.6 Accommodate Updating in Ontology Sources........................................................... 60 5.5.3.7 Provide Meta Data about Ontology Services ............................................................. 60 5.5.4 Scope of Effect .................................................................................................................. 60 5.5.5 Relevant Code Packages ................................................................................................... 60 5.6 ID Generator Services .............................................................................................................. 62 5.6.1 Purpose .............................................................................................................................. 62 5.6.2 Description ........................................................................................................................ 62 5.6.3 Design History .................................................................................................................. 62 5.6.4 Scope of Effect .................................................................................................................. 62 5.6.5 Relevant Code Packages ................................................................................................... 63 5.7 Plugins ...................................................................................................................................... 64 5.7.1 Purpose .............................................................................................................................. 64 5.7.2 Description ........................................................................................................................ 64 5.7.3 Design History .................................................................................................................. 65 5.7.4 Scope of Effect .................................................................................................................. 65 5.7.5 Relevant Code Packages ................................................................................................... 65 5.8 Configuration System .............................................................................................................. 66 5.8.1 Purpose .............................................................................................................................. 66 5.8.2 Description ........................................................................................................................ 66 5.8.2.1 Pedro Configuration Tool .......................................................................................... 66 5.8.2.2 ConfigurationReader .................................................................................................. 67 5.8.2.3 Other Configuration Files .......................................................................................... 67 5.8.3 Design History .................................................................................................................. 67 5.8.4 Scope of Effect .................................................................................................................. 68 5.8.5 Relevant Code Packages ................................................................................................... 68 5.9 IO ............................................................................................................................................. 69 5.9.1 Purpose .............................................................................................................................. 69 5.9.2 Description ........................................................................................................................ 69 5.9.3 Design History .................................................................................................................. 69 5.9.3.1 Use of Layers ............................................................................................................. 69 5.9.3.2 Changing Parsers........................................................................................................ 70 5.9.3.3 Support for Streams ................................................................................................... 70 5.9.3.4 Creating the “Export to Final Submission” Feature................................................... 70 5.9.3.5 Providing Support for the Meta Data Layer............................................................... 71 5.9.3.6 Merging dataImport and IO Class Packages .............................................................. 71 5.9.4 Scope of Effect .................................................................................................................. 71 5.9.5 Relevant Code Packages ................................................................................................... 71 5.10 Alerts ...................................................................................................................................... 71 5.10.1 Purpose ............................................................................................................................ 71 5.10.2 Description ...................................................................................................................... 72 5.10.3 Design History ................................................................................................................ 72 5.10.4 Scope of Effect ................................................................................................................ 72 5.10.5 Relevant Code Packages ................................................................................................. 73 5.11 Meta Data System .................................................................................................................. 74 5.11.1 Purpose ............................................................................................................................ 74 5.11.2 Description ...................................................................................................................... 74 5.11.2.1 Walkthrough for Capturing Ontology Term Meta Data .......................................... 75 5.11.2.2 The Pedro Meta Data Editor .................................................................................... 75 5.11.3 Design History ................................................................................................................ 76 5.11.4 Scope of Effect ................................................................................................................ 76 5.11.5 Relevant Packages........................................................................................................... 76 5.12 Form Generation Facilities..................................................................................................... 77 5.12.1 Purpose ............................................................................................................................ 77 5.12.2 Description ...................................................................................................................... 77 5.12.2.1 General Classes for Generating Desktop Pedro Forms ............................................ 77 5.12.2.2 Classes for Generating Edit Fields in Desktop Pedro Forms ................................... 78 5.12.2.3 Classes for Generating List Fields in Desktop Pedro Forms ................................... 79 5.12.2.3 Classes for Generating Forms in Tablet Pedro ........................................................ 80 5.12.3 Design History ................................................................................................................ 81 5.12.4 Scope of Effect ................................................................................................................ 81 5.12.5 Relevant Code Packages ................................................................................................. 81 6 Extending the Core Code Base ....................................................................................................... 82 6.1 Replacing the schema parser .................................................................................................... 82 6.2 Adding an extra data layer ....................................................................................................... 82 6.3 Creating a new field view ........................................................................................................ 83 6.4 Adding Form Properties ........................................................................................................... 83 6.5 Creating a Web-based Version of Pedro .................................................................................. 83 6.6 Upgrading to Higher Versions of Java ..................................................................................... 84 7 Future Enhancements ...................................................................................................................... 85 7.1 Replacing the Schema Reader’s MSV Parser with Castor ...................................................... 85 7.1.1 Description ........................................................................................................................ 85 7.1.2 Suggested Approach ......................................................................................................... 85 7.1.3 Scope of Effect .................................................................................................................. 86 7.2 Auto-generate Functional Specifications ................................................................................. 86 7.2.1 Description ........................................................................................................................ 86 7.2.2 Suggested Approach ......................................................................................................... 86 7.2.3 Scope of Effect .................................................................................................................. 86 7.3 Generate “Test” Feature ........................................................................................................... 87 7.3.1 Description ........................................................................................................................ 87 7.3.2 Suggested Approach ......................................................................................................... 87 7.3.3 Scope of Effect .................................................................................................................. 88 8 Overview of Code Packages ........................................................................................................... 89 8.1 Package “pedro.configurationTool”......................................................................................... 89 8.2 Package “pedro.desktopDeployment” ..................................................................................... 89 8.3 Package “pedro.io”................................................................................................................... 89 8.4 Package “pedro.mda.config”.................................................................................................... 90 8.5 Package “pedro.mda.model” .................................................................................................... 90 8.6 Package “pedro.mda.schema” .................................................................................................. 90 8.7 Package “pedro.metaData” ...................................................................................................... 91 8.8 Package “pedro.soa.alerts” ....................................................................................................... 91 8.9 Package “pedro.soa.id” ............................................................................................................ 91 8.10 Package “pedro.soa.ontology.provenance” ............................................................................ 92 8.11 Package “pedro.soa.ontology.sources” .................................................................................. 92 8.12 Package “pedro.soa.ontology.views” ..................................................................................... 93 8.13 Package “pedro.soa.plugins”.................................................................................................. 93 8.14 Package “pedro.soa.security” ................................................................................................. 94 8.15 Package “pedro.soa.validation” ............................................................................................. 94 8.16 Package “pedro.tabletDeployment” ....................................................................................... 94 8.17 Package “pedro.util” .............................................................................................................. 94 8.18 Package “pedro.system”......................................................................................................... 95 8.19 Package “pedro.soa” .............................................................................................................. 95 8.20 Package “pedro.workBench” ................................................................................................. 95 9 Index................................................................................................................................................ 96 Appendix A: Schema for the Pedro Configuration Tool ................................................................... 97 A.1 Configuration Options for Menu Features .............................................................................. 97 A.1.1 Class: “menu_features” ........................................................................................................ 98 A.1.2 Class: “existing_menus” ...................................................................................................... 99 A.1.3 Class: “file_menu” ............................................................................................................... 99 A.1.4 Class: “edit_menu”............................................................................................................... 99 A.1.5 Class: “options_menu” ....................................................................................................... 100 A.1.6 Class: “view_menu” ........................................................................................................... 100 A.1.7 Class: “help_menu” ............................................................................................................ 100 A.1.8 Class: “help_document” ..................................................................................................... 100 A.1.9 Class: “plugin” ................................................................................................................... 100 A.1.10 Class: “custom_menu” ..................................................................................................... 101 A.2 Configuration Options for Record Structures ....................................................................... 102 A.2.1 Class: “schema_concept_field” ...................................................................................... 103 A.2.1 Class “record” ................................................................................................................ 103 A.2.2 Class “list_field” ............................................................................................................ 104 A.2.3 Class “edit_field” ........................................................................................................... 104 A.2.4 Class “attribute_field” .................................................................................................... 104 A.3 Configuration Options for Service Classes ........................................................................... 105 A.3.1 Class “service_class” ......................................................................................................... 105 A.3.2 Class “ontology_service” ................................................................................................... 106 A.3.3 Class “list_field_editing_service” ...................................................................................... 106 A.3.4 Interfaces Implemented by Service Classes and Plugins ................................................... 106 Appendix B: Mapping XML Schema Attributes to Application Properties .................................... 108 Appendix C: Schema for the Pedro Meta Data Editor ..................................................................... 110 C.1 Class: “pedro_meta_data” .................................................................................................... 111 C.2 Class: “record_meta_data” .................................................................................................... 111 C.3 Class: “field_meta_data” ....................................................................................................... 111 C.4 Class: “ontology_term” ......................................................................................................... 111 Appendix D: Summary of Design Decisions and Historical Influences for the Pedro Project ....... 113 1 Introduction This document explains how the Pedro code base works. It is a critical part of the vision for this project that developers have the freedom to download and modify the core architecture to suit their own needs. We feel that the tools will not reach a broad audience unless developers are confident that they can re-brand the product to suit their own use cases. Our hope is that if the suite of tools is explained well enough, it will stimulate the interest of best of breed developers to help us provide a group of free, open-source software tools that can manage complex data. The developer manual emphases how parts of the core application work; it is not a tutorial for how to write plugins. For more information on how to write code modules for Pedro, please consult the document “Developer Tutorial”. The document was written in a serial manner so it could be printed out easily. We’ve also converted the same manual into a collection of HTML files. The manual begins with a description of the Pedro Project in general. Although the focus of the discussion is on Pedro, it is important to understand how so much of the code base is re-used in related applications that support other kinds of activities. It also provides a road-map for future development which should help projects evaluate whether the tools are appropriate for their needs. The discussion about the vision for the project is followed by a section which describes the project history. In the world of academic research software, Pedro is rare in that most of the enhancements, suggestions and bug reports have come from groups other than the original group of molecular biologists it was funded to support. Its generic approach for generating forms has allowed multiple independent domains to benefit from sharing the costs of testing the software. Finally, the involvement of other developers has tested the architecture. The design history is interspersed with a collection of key historical influences and design decisions which are summarised at the end of the document. These highlights may help inform the design of other products. Section 4 describes a high level view of the architecture. It is intended to give developers an understanding of the general role of system components and the ways they work together. Section 5 describes how the core code base could be extended to support additional features. Section 6 describes summaries of individual code packages. It is intended to help guide developers as they navigate through the hundreds of classes that are part of the source code. For more information about individual classes, please consult the Java Docs that come with the download. They are not completely filled in, but they do include a summary of what each class does. The design document comes with four appendices which cover the following themes: the XML schema that drives the Pedro Configuration Tool description of how fragments of XML schemas are used to configure native pedro data structures the XML schema that drives the Pedro Meta Data Editor a summary of design decisions that have defined the architecture The design document was written by Kevin Garwood and edited and formatted by Chris Garwood. We both welcome your suggestions. 2 The Pedro Project Vision The Pedro Project is a collection of model-driven software tools which are intended to support simple data management tasks. The project was designed as a tool suite with three components: Pedro and Pierre. Pedro is a model-driven system for creating data entry applications. It was first released in February 2003 and continues to be maintained. Pierre is another model-driven system for generating front ends which can search and retrieve data from a data repository. The tool was first released in April 2006. The Pedro Project Vision is to use a model-driven approach for creating a suite of software tools that can provide basic facilities for managing complex data sets. Originally developed to support the activities of cash-poor molecular biology labs, it has shown promise in areas of clinical informatics, epidemiology and grid computing. One of the main driving forces for the development team is to make the Pedro tools able to support data management needs in the developing world. The software is free, open-source, cross-platform and is intended to be adapted by a variety of local communities using a minimum of skilled software developer resources. This section introduces each of the tools, describes their common characteristics, and shows how they could be used together to maintain a data repository. 2.1 The Tools 2.1.1 Pedro Pedro is a system which generates data entry forms to suit concepts defined in an XML Schema. End-users enter data through the forms to produce XML-based data sets which will validate against the schema. Figure 2-1 shows an example of a generated Pedro application: Figure 2-1: An example Pedro Application The tool promotes high data quality through features which support guided data entry and validation services. One of its most sophisticated features is its support for marking up form fields with terms that come from one or more ontologies. The appearance and functionality of generated forms can be customised via the Pedro Configuration Tool (Figure 2-2). Figure 2-2: The Pedro Configuration Tool Using the model-driven approach, Pedro uses data models to generate the generic functionality and relies on a family of plugins to support domain-specific functionality. Validation services and general purpose plugins can be developed which have effects at the document, record and field levels of data entry. Other mark-up services can be developed for field-level entry. The services can be linked with the parts of the application via the Pedro Configuration tool. The tool assumes separate roles for Data Modeller and Programmer. The following assumptions made about Data Modellers include: they are domain experts they are not expected to know how to write software they use the the Pedro Configuration Tool to generate test applications for an end-user community. Assumptions made about Programmers are: they know how to write programs they are not expected to be domain experts they write domain-specific plugins which customise Pedro for a given use case 2.1.2 Pierre Pierre is a model-driven system for generating applications that search and retrieve information from a data repository. The system can generate multiple front-ends which can interact with an abstract repository that is implemented using technologies such as relational, XML or objectoriented databases. The system assumes separate roles for Repository Designer and Service Designer. The following assumptions are made about Repository Designers: they are programmers they are not expected to be domain experts they write queries and reports that work with a given data repository Service designers have these characteristics: they are domain experts they are not expected to be programmers they rapidly prototype a specification for a service by generating test applications for endusers. These roles are essentially the same as those developed in Pierre, except the roles for Pierre are more specific. Service designers are not so much designing a single application but a service which can be used to generate multiple kinds of search applications at once. Repository designers are programmers who spend most of their effort writing database queries that satisfy the needs of the user community. Building a service begins when the Repository Designers and Service Designers agree on an XML schema which describes the concepts a data repository can publish to the outside world. It forms a kind of broad contract that allows them to work independently of one another. While the Repository Designers construct or modify their database, the Service Designers can begin using the Pierre Configuration Tool (Figure 2-3). Figure 2-3: The Pierre Configuration Tool for Pierre v2.0 (current release is Pierre 1.0a) The Service Designers use the tool to create a specification for a search and retrieve service. Most of the queries are defined in terms of concepts that have been defined in an XML schema. The designers can use the tool to generate a prototype application and use this to elicit feedback from end-users (Figure 2-4). Figure 2-4: A test application generated by Pierre During this rapid prototyping phase the test application is linked to a dummy data repository which returns junk data results. This is done to provide an idea of what to expect from a real data repository (Figure 2-5). Figure 2-5: Junk data results for the query made in Figure 2-4. When the service is completed, the service designers can generate a functional specification which can inform the Repository Designers about what queries need to be supported (Figure 2-6). Figure 2-6: Functional specifications for search and retrieval service generated by Pierre. The Repository Designers then write code for the queries and fulfil the user requirements. Once the repository has been finished, the service designers can substitute the dummy repository for the live one (Figure 2-7). Figure 2-7: A test application generated by Pierre that interacts with a live data repository When all the designers and the end-users are happy with the service, the service designers can use the Pierre Configuration Tool to automatically generate multiple front ends which interact with the same repository (Figure 2-8). These front ends include a command line service; a text-based menudriven application; a standalone GUI application; and a web application. Figure 2-8: Generating multiple front ends with Pierre. 2.2 Shared Aspects of the Tools Pedro and Pierre will use the same engine for interpreting XML schemas, but will associate different properties with each concept: Pedro associates concepts with properties of data entry functionality Pierre associates concepts with properties of data dissemination functionality As well as sharing the same engine to interpret schemas, most of the user interfaces for the utility and configuration tools will be generated using Pedro. For example, the Pedro Configuration Tool is an instance of a generated Pedro application which is customised with plugins that help Data Modellers. Pedro’s Meta Data Editor is another generated application which allows Data Curators to modify the meta-data that is kept for each data file. When Pierre 2.0 is released, the Pierre Configuration Tool will be another instance of Pedro that is customised to generate data dissemination services. Figure 2-10 shows the family of the products that will all use Pedro as the basis for data entry: Figure 2-10: Tools for the Pedro Project that will use customised versions of Pedro to support data entry tasks. These include the generated Pedro applications for end-users; the Pedro Configuration Tool; the Pedro Meta Data Editor; the Pierre Configuration Tool. Reusing Pedro in these ways helps test the core code base and reduces the amount of new code that needs to be developed and tested. Pedro and Pierre share many common form features. They both support field and record-level validation services. They also use the same system for marking up form fields with ontology terms. The Pedro Ontology Service Framework is a shared subsystem that allows end-users to mark-up forms with terms from one or more ontologies. The shared form features mean that the data quality of query submission will be as good as the quality of data curation. There are other examples of code being reused in the tool suite. For example, the Pedro Alerts Editor allows users to associate a set of matching criteria with an intent such as an error or a warning. The UI for defining the matching criteria is found in two other places: the advanced search feature of the Pedro Configuration Tool the advanced search feature in the standalone application generated by Pierre. Another aspect all three tools share is support for both rapid-prototyping and deployment phases of development. They are designed so that changes in the schema or the service description can be automatically propagated to the applications. This allows service designers to rapidly elicit feedback from end-users via auto-generated applications. Projects can choose to limit their use of the tools to gathering requirements for software they will create themselves. Alternatively, they can choose to use the generated applications in a production setting. The use of a model-driven approach allows developers to control the extent to which they commit to using a new technology. Although the tools share many features and parts of the same code base, they are intended to be marketed as independent applications. For example, Pedro does not require users to know about Pierre. Pierre will rely on code used in Pedro but users are not expected to download Pedro to make the other two tools work. Marketing the tools independently is another way of allowing developers to limit their investment in the technology. The limited remit of each tool allows them to be used as lightweight components in a larger system. 2.3 Using the Tools in Concert The tools could be used together in a use case scenario that has two phases. During the data entry phase of deployment, end-users could use Pedro to create XML data sets. Once the data sets have been created, they can be used to create a data repository. The XML files could be placed in an XML repository such as eXist. Alternatively, developers could create scripts which extract specific fields from the data sets to make custom purpose repositories. This approach could have a number of benefits. First, sensitive data not related to the expected use of the repository could be left out. Second, the original data sets are preserved, which provides a form of backup. Third, databases could be heavily optimised for certain types of queries. One difficulty we observe in bioinformatics repositories is they tend to include a vast amount of machine-generated data that are relevant for analysis tasks but are not relevant for search queries. Pierre could be used to provide end-users with multiple front-ends that interact with the same data repository. The schema used to make a dissemination service would be limited to having those concepts which can be published. Repository Designers who manage large complex databases could leave out concepts which didn’t address specific queries, or which only existed to service the database. For example, the foreign key references in a relational database could be left out of the schema because they would not be useful query concepts for an end-user. 3 Project Environment 3.1 Project History Pedro is a software tool which was first released in February 2003 and has been maintained for the past three years. It is intended to be a model-driven data capture tool that can be used to create data sets in a number of domains. The tool is a generic software application that can be customised for domain-specific tasks in a number of ways. It is designed so that much of the data modelling and documentation can be done by a domain scientist who is not a software engineer. When software developers are needed to adapt the tool, their efforts can be limited to developing plugins that support domain-specific activities. There are a number of historical aspects of design which have helped make the tool suitable for supporting this use case scenario. They are summarised throughout the following description of the project’s development history. These points may be helpful in evaluating the suitability of the tool for a new project setting. The development of Pedro has been heavily influenced by a community of molecular biologists who want to standardise the format, structure and content of electronic data sets which describe their experiments. A growing number of projects in molecular biology are trying to express their experiment designs in terms of formal data models. Their hope is that making their electronic records comply with the model will lead to a greater level of uniformity in the data sets that appear in public data repositories. Creating model-compliant data sets can establish a level of data quality that makes the files easier to exchange between members of the same lab, members of different collaborating labs or between members of the broader bioinformatics community. Furthermore, it is easier for research groups to develop analysis programs that scan data repositories when the data sets have a common structure. Throughout 2002, Manchester University was involved with the Proteomics Standards Initiative (PSI), a consortium of scientists who were developing a community data model that described proteomics experiments. Like many standards bodies, the PSI relied on a committee of volunteer participants who met on a semi-regular basis. They assessed a number of experiment use cases and began to develop a model that would formally describe them. Usually, the speed at which standards develop in bioinformatics is much slower than the speed at which individual labs produce and make changes to data sets. Often, the IT systems used to manage electronic laboratory data have to be adapted so that they export and import data in a form that complies with the new community standard. Laboratories tend to have limited access to software developers and by the time their systems comply with the new model, either the community standard or the local experiment model has evolved. This makes it difficult to create data repositories that have uniformly structured data sets. Manchester’s proteomics community began to feel that for a standard to be widely adopted, there would have to be evidence that it could be implemented with some kind of data capture tool. Historical Influence 1: The community of potential end-users wanted software that could produce data sets which complied with a formally defined domain model. The development of a data capture tool to suit a complex data model could take a long time to develop. Furthermore, Manchester’s proteomics group didn’t have much access to software engineers. Like many other molecular biology labs, they were funded to do research, not development. Development would be viewed as an overhead in developing a means to an end. In Autumn 2002, I was working as a contract programmer for the E-Science North West Centre. The ESNW fostered projects that helped groups apply technology developed at institutions in the North West. It remains a service-based organisation which tries to provide help to ongoing projects. One of their services was providing projects with help developing software applications. For the organisation, developing applications was an end in itself because its focus was on service provision, not research. Historical Influence 2: the software project was partly funded by the ESNW, an organisation whose remit was service provision, not research. The attitude of the organisation was that the software should support end-user activities. This helped make the development of Pedro different from other projects I’ve worked on, where the emphasis was on developing software that minimally met the needs of some research grant. I was assigned to the Pedro Project in September 2002. By this time, the project manager was Norman Paton and the main researcher was Chris Taylor. Chris is a geneticist who was heavily involved in developing models for the Proteomics Standards Initiative (PSI). He was also tasked with developing the software, but this second task presented two problems: He was a domain scientist but not a formally trained software engineer His duties for developing standards left little time to develop supporting software I was brought onboard the project for a period of five months, after which I would be reassigned to another activity. Chris Taylor had already been developing the PEDRo standard for many months. During that time, he had also developed a very primitive prototype of what some of the forms might look like. His work provided a list of requirements gathered over the course of the preceding year. The clarity of the requirements meant that I did not have to spend the time interviewing clients myself. Historical Influence 3: a year of requirements gathering had been done prior to the initial development of the software tool. Norman foresaw two things that would happen after I would leave the project which form the next historical influence: Historical Influence 4: Pedro would be a tool that would be maintained by domain scientists who were not trained software engineers. The tool would have to accommodate frequent changes made to the underlying data model. The most efficient way to build the tool would be to make a number of bespoke application forms that supported a snap shot of the model. This would require the least amount of up-front design work and would yield a working prototype in the shortest amount of time. However, the most important aspect of the tool appeared to be its ability to be maintained and accommodate change. There was no way of assessing which part of the PEDRo standard would change in the coming year, so the application could not attach semantic significance to any of the model’s concepts. Norman suggested that I make a tool that generated forms based on a formal data model. Historical Influence 5: To make the tool easy to maintain, it was designed using a modeldriven approach. The tool was designed to be independent of the data model it used to generate forms. This had two consequences: It allowed me to develop the tool without first acquiring competence in the proteomics domain It allowed Chris Taylor to develop the data model without requiring much knowledge about how the tool worked. This effectively promoted aspects of parallel development for the same software tool. It created a separation of design concerns which suited the skill sets of a domain scientist and a software engineer. Chris Taylor used the XML Schema language to express the domain model. He had to limit the structures he used because the tool wasn’t able to interpret all of them. However, the result was adequate enough to present the PEDRo model to end-users. He observed that many scientists were better at giving feedback on the model via the application forms instead of through a complex UML diagram. This helped to lower the skill set required by biologists for them to help participate in the data modelling process. The first prototype of the tool gained the interest of the MyGrid project. They wanted to use Pedro forms to create descriptions of bioinformatics services. In particular, they were interested in the tool’s ability to mark up form fields using key terms from one or more controlled vocabularies. The MyGrid team made suggestions that led to the development of Pedro’s Ontology Service Framework. This framework was published in a paper at the European Semantic Web Conference in 2004. MyGrid’s involvement represented the first influence of a domain outside of proteomics. Other groups would express interest in the coming months. Soon, groups In its first year and a half of release, it was used as a rapid prototyping tool to help validate complex models with end-users. Historical Influence 6: The model-independent nature of the tool encouraged other domains to use it. Their feedback helped identify bugs, and led to new features which helped to service the user community the tool was initially commissioned to support. Pedro eventually caught the interest of the EBI. They wanted to use the tool with other models that were too complicated for the tool to handle. To overcome this problem, Kai Runte was hired. He was responsible for developing a new schema reader which was based on Sun Microsystem’s MSV schema reader project. Through his work, he compelled me to explain how parts of the code base worked. This is a vital aspect of developing open source projects that work for other groups. When he finished developing the schema reader, he tested it on a schema that was auto-generated from a DTD of the MAGE model. MAGE was a very complex model which described microarray experiments. Kai’s effort made Pedro able to support a much broader range of complex data models than what the tool could support before. This helped make the tool more appealing to other domains. He was also the first external developer who was invited to make significant changes to the code base. This is an important event in the development of any open source project: Historical Influence 7: Pedro’s ability to support other models was greatly improved by the work of another developer who was not funded by Manchester proteomics group. The collaboration made the software code base more appealing for open-source project work. I continued to make fixes to the code base long after my contract to serve the proteomics group had finished. The ESNW began to recognise that the tool could be applied to other domains, so it encouraged me to continue my interactions with multiple domains. Historical Influence 8: the remit of the body funding the software development was broad enough to allow the tool to be applied and modified to suit multiple domains. The proteomics groups at Manchester continue to focus their efforts on the development of standard models. There was a reluctance to deploy the tool in the domain until the model had stabilised. This meant that the bottleneck for software release was not the development of the software but the development of a particular model. These circumstances helped make it acceptable to consult other domains whose models were either simpler or more mature. The feedback these groups provided helped improve the tool as it would be used by proteomics scientists. Historical Influence 9: the proteomics standards took so long to develop that the software team began to focus on testing the tool on domains which had simpler or more mature data models. Eventually people began using the tool for data entry rather than simply as a rapid prototyping tool. This change in user habits necessitated a superior level of documentation, testing and end-user training that would make Pedro a production-quality tool. The problem was that being the main developer on the project, the program made sense to me. Therefore, I would think it made sense to people using it as well. This is what I refer to as the developer blind spot and it is why core developers should never be in charge of documenting their own products for end-users. In 2004, Chris Garwood joined the team and became responsible for helping to make the tool a product that could be used in day to day activities. He wrote a test plan, rewrote all training materials and evaluated the tool with end-users. This produced an important separation of roles on the project which would benefit the people using it: Historical Influence 10: another software engineer was brought in to make a testing plan, rewrite training materials and interact with end-users. His detachment from the code base gave him objectivity in evaluating how well the tool worked for users. It helped eliminate biases main programmers would exhibit in justifying their work to end-users. The success of Pedro led to a follow-on project called Pierre. Pierre applied the same model-driven philosophy to generate forms for search and browse query forms that interacted with a data repository. Much of the form generation activity was borrowed from the Pedro code base. Historical Influence 11: The Pierre Project was built using the Pedro code base. This helped improve the robustness and extensibility of core Pedro libraries. In the Spring of 2006, Chris fielded a request made by Jennifer Lynch, a mass spectrometer lab scientist who was working with the Manchester proteomics group. She liked the tool but wanted a version that would be able to work on some portable computing device. She explained how the program was difficult to use in the lab because it needed to be installed on a desktop. The constraints and safety regulations in a molecular biology lab make it difficult to transcribe data directly to a laptop or a desk top computer. After three interviews and ten business days, we produced Tablet Pedro, a version of the tool which would work on a Tablet PC. Over 90% of the Pedro code base was re-used, thus showing the value of using a model driven approach that could generate forms for different kinds of deployments. The advent of Tablet Pedro has since attracted greater interest from wet labs and new interest from research projects that do data entry in rugged outdoor settings. Historical Influence 12: A lab scientist guided the development of Tablet Pedro, which could be deployed on a Tablet PC. The development has shown that Pedro can be used in a laboratory, and it promises to attract the interest of other domain groups who gather data in remote areas. It also shows the program can be adapted to generate forms for alternate forms of display. Since 2004, I’ve been interested in using Pedro to help service research projects in the developing world. I’ve gathered a number of requirements from interactions with organisations involved with activities having this theme: The tool should support languages other than English The tool should support different kinds of form fields which may describe images, audio and video clips. The tool should be documented well enough to not require my involvement The tool would have to show greater levels of customisation to suit other domains. We’ve done a number of things to help meet these requirements: 3.2 Design Forces This section describes the major design decisions which shaped the development of Pedro. Other minor design decisions are described under descriptions of subsystems. 3.3.1 Towards a Model Driven Approach Two major design assumptions motivated us to adopt a model-driven approach to creating a data entry tool. The first assumption was that the model would continue to evolve rapidly and that all model concepts were equally likely to change. At the onset of development, there would have been a high overhead making manual changes in the code to suit a model that was changing every week. We were concerned that if the initial prototype of the tool underwent too many manual changes, the end-product would be error-prone, hard to extend and unlikely to perform when it was used in a production setting. Therefore, this defined our first major design assumption: Design Assumption 1: the underlying data model will change and all model concepts are equally likely to change. The assumption caused us to design first and foremost to accommodate change in the model. This began a process of decoupling the model concepts from the ways they were visualised. Regardless of what model concepts were added, deleted or modified, the form fields were rendered in the same ways. For example, a radio button would always be rendered the same way, an integer field would produce errors if letters were typed in it and a list field would always have “New” and “Edit” buttons. Development effort focused on a small collection of form field rendering classes and these were tested independently of the model. The second design assumption characterised the developers who would maintain the product in the future: Design Assumption 2: the application would continue to be serviced by scarce developer resources. These people would likely be skilled domain experts but not trained software engineers. Applying this assumption helped us gauge the kind of software maintenance activities a typical developer would be capable of carrying out. We assumed their programming experience would be limited to the production of scripts that emphasised procedural rather than object-oriented programming. They would unlikely be able to fix or maintain bugs in the complex collection of classes. This was especially true of a largely uncommented code base that was the result of an initial prototype. Instead, the programming efforts would have to be limited to the production of modules which interacted with the rest of the program via well-defined interfaces. The content of these modules could retain the procedural style of programming to which they were accustomed. Design Decision 1: Pedro will be used developed using a model-driven approach. Model-driven systems run off a model that describes properties of a software application. The decisions about what properties should be included in the model are at the discretion of the developers who are creating the systems. We realised we had to strike a balance between including enough configuration options to make Pedro applicable to a wide range of use cases, yet make it simple enough to appeal to developers who were not necessarily trained software engineers. The more configurable an application is, the more use cases it can support. However, increasing the number of configurable options also increases the learning curve for any developers who use the system. Using a model-driven system ceases to be feasible if the amount of effort to configure an application is as much as the effort needed to code one from scratch. We also had to consider how much developer resources were available to write code that supports the set of auto-generated features we wanted. Together, these forces motivated us to try and simplify the application model. We began by envisioning an idealised system for managing a data repository (Figure 3-1). Figure 3-1: An example of an idealised system for managing a data repository. Aspects of the system were evaluated and either removed from the scope of development, hardcoded or expressed as configuration options. The following subsections describe how we tried to limit the application model. 3.3.2.1 Constraining the Task The most important way of simplifying the application model is to realise that the tasks of data capture, data dissemination, analysis and the provision of security services can each be addressed through separate applications. I speculate that the design of the typical repository shown in Figure 1 is influenced by the following key technology decisions: the application is deployed via the web because that medium is most popular for supporting search and retrieval services special technologies are used to enhance the usability of web forms. This is done to help support data capture activities data sets are managed in a data repository to support data dissemination and analysis activities the data repository is usually organised as a relational database to benefit analysis programs I believe that the decision to support all the major tasks in one application commits each task to being supported by technologies that are better designed to support other activities. By resolving the tasks into separate applications, the overall application model is simplified and better technology choices are made for supporting individual tasks. The separation of tasks is shown in Figure 3-2: Figure 3-2: Simplifying the application models by separating tasks For the development of most repositories, the main tasks have the following ranking from most to least important: 1) data capture; to populate the repository 2) data dissemination; to search and retrieve parts or all of data sets that match selection criteria 3) analysis; to apply various algorithms to large number of data sets. The provision of security is usually considered in the initially stages of a project but is left last to be implemented. Given the limited developer resources initially assigned to the project, the focus of the tool became data capture. Although the data capture tool would have provisions for plugins which could support other tasks, the core part of the architecture would not be designed to support these other tasks. Task Constraint 1: Pedro will be designed to support data capture tasks. Although it could have plugins that support other activities, its core architecture will not be designed to suit other tasks. Other activities such as data dissemination, analysis and the provision of security services will be dealt with in separate projects. The application to generate then becomes a data capture tool. Further constraints can be made to the front end user-interface and the back end storage of data sets. 3.3.2.2 Constraining the Front End Data repository applications can be deployed in a number of ways, including a web application, a standalone GUI application and a command-line service. Although the web is a popular form of deployment for data dissemination tasks, it is less suited to support complex data entry activities. This is due to the differences in habits between people who produce data sets and others who use them. In any given project, there will typically be a small number of data producers and a relatively large number of data consumers. Data producers will usually work either in the lab which manages the repository or for one of the lab’s partner organisations. The curators will spend long periods of time using data capture tools and will value usability more than accessibility in the software they use. Data consumers will usually be spread out over different locations around the world. They will use the repository sporadically and will typically spend a brief term trying to retrieve data sets that will match simple selection criteria. They will value accessibility over usability in the applications they use. Front End Assumption: people using data entry tools to record complex data will value usability more than accessibility. The design of Pedro’s front-end had to consider usability first and accessibility second. Although Pedro was going to be developed as a web application, this deployment form was rejected in favour of a standalone GUI application. Three factors influenced this decision: usability development time performance. Web forms tend to have poor usability because they are made with a limited collection of simple form objects such as labels, fields and buttons. Technologies have been developed to enhance the forms so they are almost as usable as the same forms would be in standalone GUI applications. However, the enhancements are not universally supported by various browser clients such as Internet Explorer, Netscape, Mozilla and Firefox. Therefore, relying on these technologies to build data capture applications would invite platform dependencies that would undermine the web’s appeal for promoting widespread access to data. To maintain the aspect of universal access, web technologies for building the GUI would have to render plain HTML forms that would be supported in all browser client programs. Web Technology Assessment 1: Web applications developed to promote widespread access to data should not rely on special technologies for rendering forms. They should use plain HTML forms that can be rendered by all browser client programs. During the onset of the project, a number of web technologies were evaluated for creating the front end of the data entry tool. In bioinformatics, the use of Perl and CGI scripts is popular for making simple web forms. However, Perl is a language that is best suited for simple scripts. It lacks object-oriented features that would allow it to support large systems. Instead, Java-based technologies were considered because the language scaled better as applications became more complex. Of the Java-based technologies, only those which generated plain HTML forms were considered. At the end of the evaluation, I decided that the best web technology candidate to use would be Jakarta’s Struts project. It combined the use of Struts libraries, Java Server Pages (JSP) and Java Servlets. This approach had a number of advantages: the framework supported arbitrarily complex applications better than other technologies framework separated the model from view aspects of an application; this suited a modeldriven approach. applications rendered HTML pages that could be rendered using any browser client it depended on a suite of technologies which were all written using Java. Overall, this technology would have been the most suitable to use for rendering Pedro as a web application. Web Technology Assessment 2: The Jakarta Struts project was the best web technology evaluated to render Pedro as a web application. However, the Struts project also had a number of drawbacks: each form required screen presentation, action handler and business object layers. While this approach allowed the framework to support complex applications, it required tedious programming effort coordinating the layers. the applications were difficult to test. Any schema for auto-generating the application would have to take into account the coordination of three layers. In contrast, generating an application with Java swing screen objects was far easier to do and took less effort to design and test. The screen objects were more flexible to configure than simple web form objects and made the standalone GUI application more usable than a web application. Finally standalone GUI applications may provide better performance for end-users. Web applications are run within browser clients that render data on the screen. Some of these programs have difficulty rendering large data sets. This is much less of an issue for standalone applications Front End Decision 1: Pedro will be developed as a standalone GUI application rather than as a web application. 3.3.2.2 Constraining the Back End Data entry tools often don’t require the presence of a data repository. Data dissemination services require one in order to return part or all of data sets that match selection criteria. They usually focus on the few human-readable fields that describe each data set. Analysis services often require data repositories to be organised in ways that make data sets more amenable to computationally intensive operations. They focus on the large volumes of machine-generated data that appear in one or more data sets. The needs of data dissemination and data analysis services are not shared by data capture services. A curator will typically edit only one or a few data sets in one session. Once the data sets are submitted for publication, they are not likely to be edited again. Therefore, curators who are editing a single data set do not require access to a large data repository. Making a data capture tool require the presence of a data repository invites an overhead of technologies that are designed to suit other tasks. The requirement makes the model-driven approach more difficult because Pedro would have to generate code in some kind of database language such as SQL. Installing the generated application could be complicated by the need to install a database. The applications would also be tied to code library dependencies that were inherent in whatever database technology was used to manage the data sets. Instead, Pedro should store a data set as XML documents that can be validated against the domain model. XML is a data format that is widely used in the bioinformatics. Apart from making a model driven approach easier to implement, the use of documents frees developers from relying on one monolithic database. Programs could be developed which extract field values from a collection of master files and use them to populate specialised databases. For example, consider how a repository could be designed to suit a simple search and retrieval service. Only the meta data from each data set would need to be included in a database. Data sets which match selection criteria could be downloaded by the users, who could then view them using the data capture tool. Other use cases may require different parts of the master data files to be extracted. This flexibility allows developers to create databases that are optimised to suit different tasks. It also allows them to migrate from using one data repository technology to another. Developers could write their own services to write data sets to different formats. However, the core architecture will use XML documents to encode data sets. Back End Decision: Pedro will store data sets as XML-based documents. Through plugins, it can support committing data in other ways but the tool will not require the presence of a data repository. 3.3.2.2 Constraints on Programming Languages and Generation of Applications Many model-driven systems allow designers to auto-generate application code in multiple target languages such as Java, C, C++ or Perl. Java was used to develop the system for a number of reasons: developer’s preference: Although I’ve done development in C, C++ and Perl, I had the most experience with Java. In the initial five month time frame alloted to develop an initial prototype, Java represented the smallest learning curve to begin coding. in-house experience: the two most popular languages used to develop bioinformatics applications were Perl and Java. Using Java increased the likelyhood that others could contribute to the project scalability: Java is object-oriented and is scalable for large scale systems of classes. I felt Perl was good for scripts but poor for designing large complex systems. Language Decision 1: the Pedro system and applications generated from it will be written using Java. It seems that it is better to use application models to generate forms at run-time rather than rely on code-generation facilities. Generating code in multiple languages was not regarded as a priority because Java code will run and operate on all the machines used by prospective end-users. Eliminating code generation facilities removes the need to develop code to mechanically generate Java classes. It also removes the need to have configuration options associated with the activity appear in the application model. Auto-generated applications tend to be difficult to adapt or be understood by developers. They would prefer to extend or modify a well-documented code base that interpreted an application model at run-time. Generation Decision 1: Application models will be used to generate forms at run-time rather than rely on code-generation facilities. The Pedro Configuration Tool itself represents a data capture tool that uses the application model. In the initial releases of the tool, applications were configured by manually editing a configuration file which associated various application properties with concepts in the domain model. However, over time this model grew more complicated. Eventually we were reminded of the balance point where it takes as much effort to configure an application as it does to write one from scratch. To make configuring the applications easier to do, a configuration tool was developed. This is the MDA design tool but it is generated from an application model that describes all the configuration features that can be associated with other models. Generation Decision 2: the MDA Design tool is a data entry application that uses a model describing configuration properties. The tool will be generated in the same manner as the other applications it helps create. 4 Pedro Architecture Pedro uses a model-driven approach to generate generic application features and a collection of plugins to support specialised features. Application forms are generated based on a data model that describes the records and fields which may appear in a document. Form data are manipulated by standard application features or by a collection of plugins supplied by other developers. Figure 4-1 shows the architecture for the tool. The numbers in the diagram label the sequence of events associated with opening and editing a file. The figure is referenced in the next two sub-sections that describe the model-driven aspects and service aspects of the design. Figure 4-1: Architecture for Pedro 4.1 Model-Driven Aspects The process begins when Pedro interprets a data model. A Schema Reader reads an XML Schema which describes records and fields that can appear in a document. Although XML Schema is a very expressive language, there are some application features that it cannot express. Pedro compensates for this by having a Configuration File which maps schema concepts to various application properties and services. Information provided by the Schema Reader and the Configuration Reader are combined to create templates of native data structures which represent form records. These templates are managed by RecordModelFactory (1) and are instantiated whenever Pedro needs to create new record objects to hold data (2). Pedro serialises its data as XML files which validate against the XML Schema. By default, documents are saved as *.pdz files. This native file format is a ZIP file containing multiple layers of information. The application can also import or export the form data as an XML file. This format is used to produce versions of the document that are submitted to data repositories. When Pedro reads a document, its I/O routines use the RecordModelFactory to create a tree of data objects(3). This tree is then passed to form generation facilities (4). Pedro can generate applications that suit desktop and TabletPC display devices. The form generators shown in 4-1 use properties of the data objects to render forms and use information held by the ConfigurationReader to render other aspects of the application. 4.2 Service Based Aspects The generic aspects of form generation and IO alone would not be sufficient to service many use cases. To support domain-specific functionality, the architecture supports service interfaces that can be implemented by developers. The following sections describe the anatomy and categories of Pedro services. 4.2.1 Service Anatomy Pedro services adopt a published abstract interface in order to hide implementation details from the rest of the system. They are characterised by a task, a scope of effect and an aspect of persistence. Each of these properties effects the way the service behaves in the application. 4.2.1.1 Task The task describes what the service does. It is expressed to end-users through a name that would be displayed in a menu or list item, and a description that could be displayed to show more information about the service. 4.2.1.2 Scope of Effect The scope of effect determines whether a service is meant to affect the current form field, the current form and all of its subforms, or the current document. Document-level services are advertised as menu items in the menu bar of a dialog window. Record-level services are advertised with buttons that appear at the top of a form. Field-level services are indicated through buttons that appear at the end of a form field. 4.2.1.3 Persistence Services are considered persistent if one service instance is used for all requests and transient if a new service is instantiated for each request. Transient services would tend to be those which are simple and use little computational resources. Instantiating a new service object for each request has the benefit of reducing undesirable side effects of other program variables. Persistent services tend to be services which either use significant computational resources or which are meant to exhibit a “memory” of user activities throughout a user session. For example, if a service needed to start up a database, then it is better if the system assumes it is persistent to reduce the startup overhead associated with multiple requests. 4.2.2 Access to Application Variables Services can manipulate the host application in a number of ways. They can access three collections of variables known hereafter as contexts: Application context which refers to program objects that apply to all documents being managed by the current application. eg: schema definitions Document context, which holds objects that affect a single document, eg: the current form, or a tree widget that shows end-users how all the form records are organised. Form context, which holds objects that affect the current form, eg: the name of a currently selected field. The variables maintained by these contexts allow service designers to customise the UI components and the three kinds of scope limit undesirable side effects. For example, designers could use form context to help highlight the currently selected field in blue. However, in using the form context, the change in colouring would not affect the same field as it may appear in other open documents. 4.2.3 Service Types Pedro supports specialised and generalised services. Specialised services use abstract interfaces that are associated with specific kinds of data entry tasks that were identified as being common to a number of bioinformatics applications. These services address issues such as the generation of unique key identifiers, the mark up of form fields with controlled vocabulary terms and the validation of form data. General services are intended to support all other kinds of tasks that are required to customise the generic aspects of form generation. Whereas general services are always triggered through an explicit action from end-users, some specialised services may also be triggered as part of programmatic activities. The following sections describe the kinds of specialised services that are supported by the architecture. 4.2.3.1 General Services General services implement a general plugin interface that is described in later sections. Typically general services will be used to manipulate the record tree representing the data. However, developers can access variables in the Contexts to effect other parts of the application. General services will always be explicitly activated by users by way of pressing a button or menu item. 4.2.3.2 Specialised Services Specialised services adopt interfaces support specific data entry tasks. Whereas general services are always triggered through an explicit action from end-users, some specialised services may also be triggered as part of programmatic activities. The following sections describe the services supported by the SOA. 4.2.3.2.1 Validation Services There are three types of validation services that check the correctness of record data. Field level validation services validate the content of a particular field. For example, they could check that the value is a legal float number or that it matches a particular regular expression pattern. These services would be triggered whenever the user tried to commit changes to the current record. Record level validation services detect incorrect combinations of field values within a given record. For example, suppose a form had fields for “gender” and “cancer_type”. A record level validation service could be developed to detect an error for a “male” who had “cervical cancer”. Like field level services, these services would also be triggered when the user tried to commit changes to the current record. Document level validation services detect data entry errors in the entire document. These services could identify patterns in one record that don’t fit given the patterns of values found in another kind of record. For example, a document describing an experiment could have records describing a lab protocol that were inappropriate given the kind of sample described in another record. Document level validation services would be triggered either when users explicitly wanted to view errors in the current document or when they attempted to export it to some final submission format. In the latter case, the presence of errors would prevent users from sending final draft documents to data repositories. This feature would help ensure that submissions had a high level of data quality. 4.2.3.2.2 Ontology Services An ontology service allows end-users to mark up a form field with terms that come from a controlled list of terms. Although its scope of effect is a single form text field, it may use other information about the current form, the user or document to help constrain the choices of terms it provides to the viewer. This kind of service is described more in Chapter 6. 4.2.3.2.3 ID Generator Service This service generates a unique key value for a form text field. Whereas ontology services help provide values that carry semantic significance, ID generator services simply provide keys to uniquely identify a record with respect to other records in the same document or perhaps even records within other documents. For example, they could be used to uniquely tag samples in an experiment. Implementations of the ID Generator Service interface could provide a key that did not already appear in a database or records. 5 Description of Subsystems Section 3 provided a high-level overview of the architecture for Pedro. The following sections cover aspects of the design in far greater detail. 5.1 Schema Reader 5.1.1 Purpose The schema reader is a Java class that implements “SchemaReaderInterface”, and is responsible for using XML schema properties to create definitions of template record definitions. These definitions are instantiated whenever files are read or when the users create new record objects in a data set. 5.1.2 Description Pedro uses a schema reader that interprets model properties and uses the values to create templates of native data structures that are instantiated to hold data. The application interacts with an interface “SchemaReaderInterface” which can be implemented to interpret XML Schemas or other types of models. Currently, the main schema reader class is pedro.mda.schema.MSVSchemaReader. Every schema reader follows the same algorithm: 1. extract model properties 2. use properties to set attributes in classes used to produce template records (eg: EditFieldModel, ListFieldModel, RecordModel) 3. set additional attributes of template records using properties read from the Configuration Reader 4. submit template record definition to RecordModelFactory 5.1.3 Design History Pedro’s schema reader has been reworked twice since its initial release. The first version used a DOM parser to scan for particular structures in an XML schema. The decision to have Pedro rely on its own native data structures instead of generic ones like DOM objects helped insulate it from the implementation details of specific kinds of model parsing technologies. The model parser could be substituted provided it could provide enough properties to create template record definitions. This became an important benefit when I later encountered a company which used a different kind of data modelling language than XML schema. In 2003, I met representatives from Epistemics (http://www.epistemics.co.uk), a company that had developed knowledge acquisition software. PCPack allowed data modellers to graphically model knowledge that they elicited from domain experts during structured interviews. Their software was being used in the aerospace industry to model the relationships and properties of various aircraft parts. Although their software supported data modelling, it did not have a feature which could generate prototype software applications from the models. We expressed interest in collaborating with one another. I realised that in order to accommodate using both an XML schema reader and a schema reader which read their own bespoke XML file formats, I had to develop an abstract interface called “SchemaReaderInterface”. Pedro’s schema reader was reworked so that the rest of the program communicated with it via an interface rather than with a specific implementation. A schema reader was successfully developed to interpret certain kinds of models developed in PCPack. Once again, the procedure was to interpret the schema once and use the model information to produce templates of Pedro’s native data structures. Due to resource constraints, the collaboration didn’t proceed further. However, the work allowed an interface to be developed in a generic way that could accommodate other implementations of the schema reader. In 2004, the EBI’s Kai Runte offered to rewrite the schema reader. This time, the schema reader would rely on Sun Microsystem’s MSV Schema Reader. MSV was designed to parse a wide variety of schemas and provide information about them in syntax trees. The algorithm for creating the templates was the same, but a wider range of schemas could be supported. The MSVSchemaReader developed by Kai was tested on the MAGE-ML XML Schema, which had been mechanically generated from a DTD definition. We were able to load legacy MAGE-ML data files successfully and show that data files could validate against the schema. MSVSchemaReader has been used for the last couple of years to drive all the other Pedro products. There are some deficiencies with it. Although the class clearly uses a Visitor pattern, the code is very complicated. The schema reader only interprets about 11 out of 44 possible XML schema structures, so it remains unable to understand a variety of models developed outside Manchester University. Kai left the project in 2006 and since then we’ve had no more in-house knowledge about how it works. We know that it *does* work, but we have been reluctant to change anything because so much of the code base relies on Pedro’s template record definitions. By the Summer of 2006, it was clear that the limitations of the schema reader presented a barrier to widespread uptake of the tool in other communities. The development team recognised that although a great variety of models could be accommodated by the tool, the schema reader needed to be enhanced. A decision was substitute rather than enhance the MSVSchemaReader class. Although it is a stable product, MSV appears to be maintained by one developer and it now seems a bit dated. We felt it was a strategic gain to modify Pedro so that it worked with a more modern schema reader technology. In the Spring of 2007, Chris Garwood evaluated a number of different schema reader technologies. We’re currently in the process of trying to replace the MSVSchemaReader class with one that uses Castor. Castor generates Java class definitions based on record definitions provided in an XML schema. Using Castor with Pedro will require that the schema reader use Java’s Reflection facilities to interpret properties of generated classes. These properties will be used to create template record definitions in the same way done with the previous efforts to write a schema reader implementation. XML Schema properties which are used to set attributes in Pedro’s native data structures are described in detail in Section 5.2. 5.1.3 Scope of Effect The Schema Reader is only used once when an application starts. Its job is to help create templates of Pedro’s native data structures. These templates will be instantiated to hold the information from a data set. Replacing the schema reader will require rigorous unit testing because of the potential for errors when the schema reader uses model properties to set attributes in the template record definitions. The activity lends itself to automated testing and will not require testing of the application itself. For example, test cases could run the schema reader and then perform JUnit test cases that verify that records have the expected number of type of fields. 5.1.4 Relevant Code Packages This subsystem depends on the following packages and classes: pedro.mda.schema.* pedro.mda.config.* pedro.mda.model.* Currently, the entire suite of Pedro tools depends on the work of pedro.mda.schema.MSVSchemaReader, which implements pedro.mda.schema.SchemaReaderInterface. The schema reader interprets schema properties and sets attributes in templates of ListFieldModel, EditFieldModel, AttributeFieldModel, RecordModel, RecordModelReference, which are all defined in the pedro.mda.model.* package. This package also contains RecordModelFactory, which is where the tempates are registered. MSVSchemaReader is called within pedro.mda.schema.Startup, a class that is used in every model driven tool in the Pedro tool suite. 5.2 Native Data Structures 5.2.1 Description The basic native data structure in Pedro is the RecordModel class, which comprises a number of DataFieldModel objects. Figure 4-1 shows the different kinds of data fields that can be created: Figure 4-1: The inheritance hierarchy for data field classes supported in Pedro. All fields will have properties such as: a field name a help link for a URL whether the field is required or optional the kind of field view type (eg: “RADIO_FIELD”, “DATE_FIELD” etc) whether the field is an attribute or not (this probably belongs in EditFieldModel class) text to appear when an end-user hovers over the field label The two kinds of fields are EditFieldModel which manage a single value and ListFieldModel which contain one or more RecordModel objects. EditFieldModels will have: a value represented as a string a default value that should appear when forms are rendered with a new record model object whether the field value should be included as part of the display name that represents the containing record. ListFieldModel objects will know what kinds of record models they can contain and will have a collection of children RecordModel objects. There are four kinds of EditFieldModel subclasses, although they have few extra properties. A GroupFieldModel is a kind of EditFieldModel where end-users select a value from a list. Its subclass BooleanFieldModel constrains this list of choices for “true” and “false”. TextFieldModel is a marker class for identifying fields that can be associated with ontology services or id generation services. An IDFieldModel is a kind of TextFieldModel which also has an IDGeneratorService. This service creates an identifier value that can be inserted into the form field. An IDFieldModel is the data container object that corresponds to an attribute field in XML Schema. A RecordModel is a collection of DataFieldModel objects. Figure 4-2 shows how the RecordModelFactory, RecordModel, EditFieldModel and ListFieldModel classes relate: Figure 5-2: Aggregation relationships for Pedro native data structures A RecordModel will contain a collection of EditFieldModel objects and a collection of ListFieldModel objects. Each ListFieldModel can contain multiple RecordModels. When the SchemaReader is operating, it adds RecordModel instances to the RecordModelFactory. The instances act as templates that can be cloned whenever a new record model is needed for a data set. The most popular utility class in the pedro.mda.model.* package is RecordModelUtility. It has a number of methods to help group fields in different ways. There is a strong relationship between properties of the data model and properties of the native data structures. The following tables list the attributes of native data structures that should be set with values derived from an XML schema. 5.2.1.1 RecordModel Properties Property record_class_n ame helpLink Description the name of the record Property Provider XML Schema Reader: <xs:element name=”[record_class_name]”> <xs:complexType><xs:sequence>...</xs: sequence></xs:element> the URL for a web page that describes the form concept ontology_ident ifier a unique identifier that can be associated with a schema concept. This is useful if ontology services want to use a schema concept’s ontology identifier to help limit what values are presented to the end-users. form_comments comments that appear on the form and describe the schema concept tool_tip text that appears when endusers let their mouse cursor hover over a field label. recordValidati collection of descriptions of onServices record-level validation services Pedro Configuration Tool; also see pedro.mda.config.SchemaConceptConfigurati on Pedro Configuration Tool; also see pedro.mda.config.SchemaConceptConfigurati on Pedro Configuration Tool; also see pedro.mda.config.SchemaConceptConfigurati on Pedro Configuration Tool; also see pedro.mda.config.SchemaConceptConfigurati on Pedro Configuration Tool; also see pedro.mda.config.RecordConfiguration 5.2.1.2 DataFieldModel Properties Property name Description name of the field Property Provider XML Schema Reader: <xs:element name=”[name]” .../> <xs:element ref=”[name]” .../> for edit fields or <xs:group ref=”[name]”.../> isRequired determines whether a field is optional or required for list fields. Note that for list fields, the name is the record class name of another record structure or the name of a group of record class names <...minOccurs=”0”.../> means the field is optional. <...minOccurs=”1”.../> means the field is required. helpLink fieldViewType the URL for a web page that describes the form concept gives an indicator of how the field should be rendered. Pedro Configuration Tool; also see pedro.mda.config.SchemaConceptConfiguration type=”xs:string” indicates the field type is TEXT_FIELD. type=”xs:date” indicates the field type is DATE_FIELD. The field label will be followed by the date format pattern enclosed in parentheses. if a field has at most three restriction values, the field type will be a RADIO_FIELD. The field will be rendered with radio buttons. If there are more than three restriction values, the field type will be COMBINATION_FIELD. The field will be rendered with a dropdown list of choices. type=”xs:anyURI” indicates the field type is URI_FIELD. The field will be rendered with a browse button that allows end-users to search for a file. <xs:attribute.../> indicates the field type will be an ID_FIELD. In the main Pedro form, attribute fields are shown first, followed by all the other fields. An ID_FIELD will also have a “Generate Key” button that users can press to generate an identifier value. <xs:element ref=”..”...maxOccurs=”1”../> indicates a field will be ONE_TYPE_ONE_VALUE_LIST. This form field will have a desensitised text field and have a “New” and “Edit” buttons. <xs:element ref=”..” ... maxOccurs=”unbounded”../> indicates a field will be ONE_TYPE_N_VALUE_LIST. This form field will have a scrollable list showing display names of sub-records. It will also have “New”, “Edit” and “Delete” buttons. <xs:group ref=”..”...maxOccurs=”1”../> indicates field will be a N_TYPE_ONE_VALUE_LIST. The form field will have a desensitised text field and a combination box that lets the user choose which type of record to create. <xs:group ref=”..” ...maxOccurs=”unbounded”../> indicates a N_TYPE_N_VALUE_LIST field view type. The form field will have a scrollable list showing display names of sub- a records. It will also have a combination box that lets the user choose which type of record to create. When users select a type of record, the list filters to show records of that type. ontology_identifier form_comments tool_tip a unique identifier that can be associated with a schema concept. This is useful if ontology services want to use a schema concept’s ontology identifier to help limit what values are presented to the end-users. comments that appear on the form and describe the schema concept text that appears when end-users let their mouse cursor hover over a field label. Pedro Configuration Tool; also see pedro.mda.config.SchemaConceptConfiguration Pedro Configuration Tool; also see pedro.mda.config.SchemaConceptConfiguration Pedro Configuration Tool; also see pedro.mda.config.SchemaConceptConfiguration 5.2.1.3 EditFieldModel Properties Property defaultValue allowFreeText scrollingTextFi eld editFieldValida tionServices isDisplayNameCo mponent Description the default value that should be displayed whenever a new record containing this field is displayed. determines whether a text field entry can accept free-text entries; this is not applicable for fields that have drop-down lists or radio buttons. determines whether a text field is displayed with one line of text or a scrolling text area validation services that can be applied to the value held by the edit field model determines whether the field is used to derive the name of the containing record Property Provider Pedro Configuration Tool; also see pedro.mda.config.EditFieldConfigu ration Pedro Configuration Tool; also see pedro.mda.config.EditFieldConfigu ration Pedro Configuration Tool; also see pedro.mda.config.EditFieldConfigu ration Pedro Configuration Tool; also see pedro.mda.config.EditFieldConfigu ration Pedro Configuration Tool; also see pedro.mda.config.EditFieldConfigu ration units fieldValidation ServiceConfigur ation ontologyService Configurations editingComponen tClassName the units associated with a field; typically only of use for numeric fields a collection of descriptions of field validation services Pedro Configuration Tool; also see a collection of descriptions of ontology service descriptions; this is only applicable to text fields. the class name of an editing component; this should desensitise the part of the form field that holds a value. Instead, users click on “Edit” to invoke a separate editing component Pedro Configuration Tool; also see pedro.mda.config.EditFieldConfigu ration Pedro Configuration Tool; also see pedro.mda.config.EditFieldConfigu ration pedro.mda.config.EditFieldConfigu ration Pedro Configuration Tool; also see pedro.mda.config.EditFieldConfigu ration 5.2.1.4 IDFieldModel Properties Property idGeneratorService Description generates an identifier value that Pedro inserts into the text field whenever the “Generate Key” button is pressed Property Provider Pedro Configuration Tool; see also pedro.mda.config.AttributeFieldConfiguration 5.2.1.5 GroupFieldModel Properties Property choices Description the choices provided in a drop down list Property Provider XML Schema Reader: <xs:restriction base=”xs:string”> <xs:enumeration value=”value1”/> <xs:enumerationvalue=”value2”/> </xs:restriction> 5.2.1.6 BooleanFieldModel Properties Property choices Description the choices provided in a drop down list Property Provider XML Schema Reader: type=”xs:boolean” BooleanFieldModel, a subclass of GroupFieldModel, forces the choices to be “true” and “false” 5.2.1.7 ListFieldModel Properties Property Description a collection of list field validation services listFieldEditingComponentConfigu a collection rations of descriptions of components that can create or edit different kinds of records in a list field. fieldValidationServiceConfigurat ions Property Provider Pedro Configuration Tool; see also pedro.mda.config.ListFieldConfigur ation Pedro Configuration Tool; see also pedro.mda.config.ListFieldConfigur ation 5.2.2 Design History During the onset of the project, there was a great temptation to make Pedro hold all its data in DOM objects. This was because the main I/O routines were written using the DOM parser and the result of the activity was a complete in-memory tree of DOM objects. There could have been some benefit in having Pedro rely on a generic data structure that was used in other projects. However, the DOM object model had shortcomings. The generic data objects had generic means of accessing and changing data. The task of finding specific fields within a record was cumbersome and required a great deal of looping constructs. The cumbersome nature of the DOM API led me to develop a collection of native data structures that could hold better cater for operations supported by the application. This proved to be a good decision because Pedro later failed to perform satisfactorily when it loaded large data files. Performance improved dramatically when the file reading routines switched from using the DOM to SAX parser. The SAX parser did not produce DOM objects, so it would have not made sense for the rest of the code base to depend on the generic data structures. Initially the native data structures held information about both the model and view aspects of records and fields. This later caused severe performance problems because Swing-based field views were created for each field whether they were being viewed or not. The structures were later reworked in a way that rigidly separated model and view aspects. View components were only ever generated for fields in the currently displayed record. The next problem with the native data structures related to serialisation. The copy and paste feature in Pedro required that all objects and the objects they referenced were serialisable. This worked well until serialisation encountered things like ValidationService and OntologyService objects. These were interfaces that could be implemented as Java classes that themselves were not serialisable. I attempted to require that all services were serialisable but then decided to strip native data structures of references to services. The only remaining service that exists as an artefact is the way EditFieldModel references validation services. In the revised approach, the form generation facility would receive a form field, look up properties in the configuration reader, and instantiate services when the fields were actually displayed on the screen. Still, many of the native data structures had too much code. Many of them such as RecordModel had a number of utility methods which did things like identify different groups of fields. This functionality was migrated to a new class called RecordModelUtility. The result is that the data structures now hold mostly data and not information about views and services. The exception is how DataFieldModel knows about a FieldViewType. This type is something that is determined by the schema reader, and is used to encode rendering hints into the model object. For example, the schema reader can tell whether a list field can support one or multiple types of sub-records. It marks the model object with field view types such as “ONE_TYPE_ONE_VALUE_LIST”, and “N_TYPE_ONE_VALUE_LIST”. Pedro’s pedro.desktopDeployment.FieldViewFactory uses this field type value to determine what kind of form field to create for visualising the field data. 5.2.3 Scope of Effect Pedro’s native data structures are used ubiquitously in all tools in the Pedro suite. 5.2.4 Relevant Code Packages All of the model classes are defined in the pedro.mda.model.* package. 5.3 Pedro Contexts 5.3.1 Purpose to provide an extensible means for services to use parameters values that come from the Pedro application or from other software applications. 5.3.2 Description Pedro manages a number of global variables through the use of three classes: PedroApplicationContext, PedroDocumentContext and PedroFormContext. These classes are types of HashMaps which have a predefined set of keys representing various objects in a Pedro application. All service interfaces supported in Pedro allow developers to access these objects. Figure 4-3 illustrates the way the context objects relate to one another. Figure 4-3: The relationships among Pedro context classes The PedroApplicationContext defines a number of keys which refer to objects that apply to all dialogs. For example, PedroApplicationContext.RECORD_MODEL_FACTORY is a key that is associated with the RecordModelFactory object. The same factory object will be used regardless of which service in which window is using it. It would be called from within a service as follows: RecordModelFactory recordModelFactory =(RecordModelFactory)pedroFormContext.getProperty(PedroFormContext.RECORD_MODEL_ FACTORY); Other objects have a scope limited to a single dialog. PedroDocumentContext.NAVIGATION_TREE refers to the NavigationTree object that displays records in the left part of a Pedro Dialog. A service could access the NavigationTree object as follows: NavigationTree navigationTree = (NavigationTree) pedroDocumentContext.getProperty(PedroDocumentContext.NAVIGATION_TREE); PedroFormContext holds references to objects that relate to the currently displayed record. The key PedroFormContext.CURRENT_FIELD refers to the currently active field in the form. 5.3.3 Design History The development of Pedro Contexts was in response to the effect that ad-hoc development requests had on the collection of services that were supported by the tool. In some cases, a service needed to have access to another part of the application or some additional value. Delivering an extra parameter value to the service often required that the value was passed along through a delegation chain of objects that had nothing to do with the service operation. This caused parameter bloating in the methods of many classes, especially GUI-based classes. Moreover, because different services had slightly different parameters, it wasn’t possible to take advantage of common code properties. The more similar services are to one another, the easier it is to write code which can service all of them. All of the Pedro services needed an extensible way to use new kinds of information that could be supplied by objects in the Pedro application or those produced in other systems. Hence, all services were reworked so that they expected context objects. In future, Pedro will support key values for objects that come from third party software products. Data modellers will be able to add more properties using the PedroConfigurationTool. 5.3.4 Scope of Effect In the Pedro 2.0 code base: PedroFormContext appears in 142 source files PedroDocumentContext appears in 35 files PedroApplicationContext appears in 70 files. These classes are referenced extensively in other tools from the Pedro Project such as Pierre. Adding a new key in any of the contexts will make it accessible to all services. 5.3.5 Relevant Code Packages Pedro’s context classes are defined in the pedro.system.* package. 5.4 Validation Services 5.4.1 Purpose to provide field-level, record-level and document validation services. 5.4.2 Description Pedro supports validation services which can effect a field, record or document. Field and record validation services are triggered whenever the end-user attempts to commit changes to the current record. The exceptions are field-level validation activities that check whether a required edit field has a value or if a required list field has one child record. In the Pedro tool, document-level validation services are only triggered when end-users try to export a data set to a final submission format or when they use the “Show Errors” feature in the View menu. Field-level validation services are intended to identify problems in the value of an edit field or with the composition of child records found in a list field. Record-level validation services are intended to identify field values which are legitimate when considered in isolation but are wrong when considered in combination with other field values. For example, a form could have fields such as “cancert_type=ovarian” and “gender=male” which form an illegal combination of values. Document-level validation services are intended to identify errors that appear in disparate parts of the same data set. The main validation utility used in Pedro is pedro.soa.validation.ValidationFacility, which can be used to validate a field, record or document. The class has options for including or excluding certain types of errors from the validation activity. The pedro.soa.validation.* package has a small hierarchy of interfaces shown below: Figure 5-4: the inheritance hierarchy of validation service classes The top level class is pedro.soa.ServiceClass, which provides methods for setting and getting parameters. It also has a method for setting the resource directory, which is the default directory where files are expected to be found. Typically this will be the ./models/[project]/resources directory found in each model folder. The code base needs to be corrected so that all three of document, record and field level validation services subclass from ServiceClass. All the other validation service classes have methods that reference “pedroFormContext”. This is a HashMap that references a wide range of objects that are part of the Pedro application. For example, pedroFormContext has references to the RecordModelFactory, the NavigationTree and the currently displayed RecordModel. The HashMap allows services to use other parts of the application to inform how it proceeds with validation. DocumentValidationService has a method which expects the root record model in a data set. RecordModelValidationService has a similar method but it merely expects the current record model to be passed to it. FieldValidationService has fields for setting the field name and whether the field is required or not. The required field setting determines whether field validation services check that a field is empty or not. EditFieldValidationService is the same as ListFieldValidationService, except that the former expects a String field value and the latter expects a ListFieldModel object. Most of the classes in pedro.soa.validation.* perform type-based error checks on fields. Subclasses of AbstractEditFieldValidationService check for double, integer and float type errors. Each of these services also has “bounded” versions which consider lower and upper limit values. Figure 4-4 shows the variety of field level validation services that are automatically associated with the data type of a field specified in the XML schema: Figure 5-5: inheritance hierarchy of default edit field validation services that provide basic type checking capabilities. The DateValidator class is special in that it relies on a static regular expression value for the date format. The value can be defined in the Pedro Configuration Tool. Once the value is set, all instances of the DateValidator check that a date value matches the specified format. The StringMaskValidator class is used to validate field values against a regular expression that is defined as a restriction in the XML schema. This provides a powerful way of validating form fields, such as constraining values to a certain number of characters or requiring values to have a particular prefix or naming convention. Many of the validation classes implement a ConstraintDescription interface. This is used by some Pedro project tools such as Pierre to obtain a human-readable description of what a validation service does. This text is included in auto-generated functional specifications. In order to minimise the number of times validation services are instantiated, the ValidationServiceRegistry manages all services created in the system. Service designers can use the Pedro Configuration Tool to specify whether a service should be persistent or transient. The ValidationServiceRegistry uses this configuration setting to determine whether it creates a new instance of a validation service each time it is asked, or if it returns the same instance of a service each time. Validation facilities in Pedro are extended by the Pedro Alerts System described in Section 4.10. To allow validation services and alerts to work together, the “validate” method for all validation services returns a collection of Alert objects which could represent errors or warnings. 5.4.3 Design History Originally, all the validation services dealt with errors in edit fields. As the project progressed, the interfaces for various services became more complicated. For Pedro 2.0, all of Pedro’s service classes were overhauled so that services had a more consistent API. One of the most important changes was allowing service classes access to a wide number of application objects that might help guide their activities. More of this is covered in the discussion on Pedro Contexts in Section 4.3. 5.4.4 Scope of Effect The current schema reader associates most type checking services with record fields. Record and field validation services are activated in the following cases: the “Keep” or “Done” button is pressed on the main form the “New” or “Edit” button is pressed in the list fields of the main form pedro.desktopDeployment.RecordView and pedro.desktopDeployment.ListValueButtonPanel both show examples of enacting validation actions. Document level validation services are triggered when users try to use the “Export to Final Submission Format” button in the File menu or when they try to use the “Show Errors” button in the View menu. Code that calls validation services can be found in the FileMenu and ViewMenu classes found in both the pedro.desktopDeployment.* and pedro.tabletDeployment.* packages. 5.4.5 Relevant Code Packages Most classes related to validation are defined in the pedro.soa.validation.* package. Other classes appear in pedro.soa.alerts.*. Classes that call validation services will appear in the FileMenu, ViewMenu, RecordView and ListValueButtonPanel classes of the pedro.desktopDeployment.* package. The pedro.tabletDeployment.* packages contain similar classes. 5.5 Ontology Services 5.5.1 Purpose provide a system which allows end-users to mark-up form fields using terms from multiple ontology services. 5.5.2 Description The Pedro Ontology Service Framework manages ontology services which collect and render ontology terms for end-users in some kind of display. 5.5.2.1 Basic Data Structure: Ontology Term The basic unit of information handled by the services is the OntologyTerm, which comprises a label, a unique identifier and a collection of related terms (Figure 4-6). Figure 5-6: ontology term data structure. All ontology terms can be represented by the OntologyTerm class and those which can be ordered as a tree of concepts can be represented by TreeOntologyTerm as well. The label represents the word phrase that would be presented to an end-user in a display. The unique identifier represents the concept referred to by the term. For example, “cat” and “le chat” are word phrases for the same concept ‘cat’. The concept could have an identifier such as “www.dictionary.org/cat_01”. Humans will relate to the label whereas software agents that manage the ontology will relate to the unique identifier. The collection of related terms does not refer to a specific type of relationship such as “has a”, or “is a”. The way to determine the kind of relationship of related terms is covered later in this section. TreeOntologyTerm is a subclass of OntologyTerm which also has a notion of a parent term. It is used to support ontologies that are structured as a tree of concepts. 5.5.2.2 Ontology Provenance Pedro uses OntologyTerm as a lightweight data container for holding information about ontology terms. However, when users select and use terms, Pedro attempts to find out more information about them. In the sections covering OntologySource and OntologyViewer, there are interface methods for “getOntologyTermProvenance”. These methods cause a source or viewer agent to return data that describes more details about a given ontology term. OntologyTermProvenance contains information described in the SKOS standard. Most of these details are probably only meant to be processed by software agents. Pedro saves the meta data about selected ontology term in its *.meta data layer (See Section 4.9). The provenance information is important in cases where an ontology term has been reclassified in the same ontology, or if a term has been deprecated. 5.5.2.3 Ontology Services The Pedro Ontology Service Framework can associate one or more ontology services with a form field. An OntologyService will have at most one OntologySource and one OntologyViewer (Figure 4-7). It can have one, the other or both of these kinds of objects. Figure 5-7: an ontology service, which comprises at most one ontology source and one ontology viewer. An OntologySource is an agent that provides ontology terms to the system. It is an interface which is intended to hide implementation details of how terms are read from a storage medium. For example, the terms could be a list of words in a simple text file; a bunch of rows in a relational database table; or tag values in some XML-based file. The terms could also be managed locally or remotely. The interface is designed to shield the rest of the application from these data management details. An OntologyViewer visualises ontology terms for end-users and is designed to accept OntologyTerm objects provided by an OntologySource. A viewer can present data in a number of ways such as a simple list, a table, a graph, an image map or a collection of images. Pedro’s use of this interface allows it to be insulated from details of how terms are rendered for the end-users. The main purpose of the viewer is to provide the system with a collection of selected terms that can be inserted into a form field. If a service has both a source and a viewer specified, then terms provided by the source are given to the viewer to render in some kind of display. If only a source is provided, then Pedro associates it with its own default ontology viewer. If only the viewer is provided, then Pedro assumes the agent will combine responsibilities of reading and presenting ontology terms. The flexibility of mixing and matching source and viewer components is meant to make it easy to integrate legacy components. With this scheme, a developer can wrap the part of an application that parses terms; the part that views terms or both. The same application can be wrapped as a source and a viewer, allowing the same component to be marketable for use in other ontology services. There are a couple of reasons why developers may want to wrap just a viewer. Sometimes the legacy application may not have a rigid separation between its model and view aspects. In this case, the source may somehow be tied to graphical objects even though its job does not include rendering activities. The viewer could require specialised parser routines that contain information which is non-compatible with the OntologyTerm objects provided by some other source. Another reason is to allow the ontology viewer to take advantage of implementation details in the ontology source. A source hides most of its implementation details from an OntologyViewer. The viewer accesses information about the ontology via the methods in the interface used by the source. In some cases, developers may want to expose rather than hide certain formalisms. For example, an OWL-based ontology can support various logical arguments and can support a variety of complex relationships. It may be desirable for expert end-users to have more features in the viewer which take advantage of ontology terms as they are expressed in OWL rather than from the term objects provided by a source. 5.5.2.4 Ontology Source Figure 5-8 describes the OntologySource in detail, as well as showing some of the default ontology sources that come bundled with the application. Most of the methods in OntologySource take an argument “pedroFormContext” which is an instance of the class by the same name. PedroFormContext is a HashMap that allows developers to reference parts of the Pedro application from within the service. It is described in more detail in Section 4.3. Table ZZZ describes the major methods of the OntologySource interface: Method isWorking getOntologyTermProvenance containsTerm Description a diagnostic method used by Pedro to determine whether a source is fit to use or not. The most likely causes of a failure in the source is that it can’t find some resource file it’s looking for or it can’t connect to the Internet. The result of this method can determine whether an ontology service is listed for the end-user to use. the way the source provides provenance data for a given ontology term. This method is called when users decide to select a term in the viewer. The viewer attempts to capture all the information about the term so it can be included in Pedro’s meta data file (see Section 4.9) designed to let the OntologySource be used in other contexts such as search and retrieval services. The idea is that the same source can be used to tag a field with a term and to perform a lookup operation to check that the term is part of its ontology. getTerms returns a collection of ontology terms. This is used by viewers to render an ontology as a list. getSubOntologySource this is used to return an ontology which represents part of a larger one. For example, consider an ontology which is a very large taxonomy file of animal species. The same taxonomy could be used in a number of data forms but only a branch of the taxonomy is needed for any given form field. The parameters of the method could include anchor terms which help intialise the starting point of an ontology for a given field. getSupportedRelationships An ontology could support multiple ways of relating terms to one another. This method returns the list of relationship types used by the ontology. getRelatedTerms given an ontology term and the name of a supported relationship type, this method returns a collection of related terms. getOntologyServiceMetaData this is basic meta data information about the service that includes: name author description version what formalisms are supported a contact email a unique code that identifies the software agent This method is principally called by OntologyService. By default the service calls the same method in its viewer. If the viewer isn’t present or if the viewer defers answering the request to its source, then this method is called. TreeOntologySource extends the OntologySource interface with a simple method for getting the root of a tree of ontology terms. Most of Pedro’s default ontology sources rely on ontologies that can be represented as trees. Figure 5-8: default ontology sources supported in Pedro. 5.5.2.5 Ontology Viewer OntologyViewer replicates many methods of OntologySource because the viewer may elect to delegate to its source for certain method calls. There are two distinguishing methods of the viewer interface (Figure 4-9). “getSelectedOntologyTerms” returns the collection of terms the end-user has selected in the viewer display. “setOntologyTermSelectionListener” notifies a component when the users have indicated in the viewer they want to use selected terms for marking up a form field. Figure 5-9: the ontology viewer interface. 5.5.2.6 Default Viewer’s Use of Introspection on Ontology Sources In most cases, data modellers will create ontology services that use the tool’s default ontology viewer. Although the OntologySource API provides access to terms, the source provides little information about how to render terms. Its “getTerms(...)” method allows all ontologies to be rendered as lists. However, to find out more rendering hints, the DefaultOntologyViewer applies Java reflection to the class which implements OntologySource. The viewer tries to determine what other ontology interfaces the class might implement. In figure 4-10, the viewer is interrogates MyOntologySource, a class that implements the OntologySource interface. Figure 5-10: marker interfaces used by ontology sources to provide rendering tips. Pedro’s Default Ontology Viewer introspects ontology sources to determine what other interfaces the ontology source classes support. MyOntologySource also implements TreeOntologySource, which means the viewer can present the ontology as either a list or as a tree. DictionaryDescriptionSupport is an interface which indicates that most ontology terms will have both a label and a text definition. This assumption allows the viewer to produce a “Dictionary View”, a table with fields for term and definition. URLDescriptionSupport indicates that most ontology terms will be associated with a web page. This assumption causes the viewer to render an html pane to show the web page. The ImageDescriptionSupport carries an assumption that most terms will be associated with an image. This allows the viewer to produce a view of thumbnail images. OntologyCaching indicates the ontology source can be updated. The interface has two methods: one to determine whether the source is outdated and another method which causes the source to update itself. The default viewer responds to the presence of the OntologyCaching interface by rendering an “Update” button if the ontology is out of date. 5.5.2.7 The OntologyContext Object Ontology services can ask Pedro questions about what else is on the current form. The OntologyContext object retains knowledge about the currently selected field, the field which invoked an ontology service, the parent record of the current form record and the field values that appear on the form. An ontology source or ontology viewer can access this object through the following call: OntologyContext ontologyContext = (OntologyContext) pedroFormContext.getProperty(PedroFormContext.ONTOLOGY_CONTEXT); Developers can use the OntologyContext object to help reduce the amount of terms that are presented to the user. An example of this is provided in the “ontology” model example that comes in the distribution bundle’s ./other_models directory. 5.5.2.8 A Walkthrough for Selecting an Ontology Term The process of marking up a form field with a term begins with the end-user right clicking over a starred form field label. In the desktop application, the label will belong to pedro.desktopDeployment.TextFieldView. That label will be associated with the TextFieldView’s instance of OntologyServiceManager. This class manages the task of presenting a right click menu and inserting terms into the text component of the form field. The OntologyServiceManager listens to the right click action on the form label and generates a popup menu. It will add a menu item for each ontology service registered for the field. If the form field does not allow free-text entry, then a “Clear” menu button is added to the popup menu. In createMenuItemForService(...), Pedro works out what to do in cases where a service has only an OntologySource, only an OntologyViewer or both. When an ontology has less than 20 menu items, the OntologyServiceManager tries to render terms as menu items. When there there are between 21 and 40 terms, the manager object tries to render submenus. For larger ontologies, it simply makes a menu button that can cause the ontology viewer to pop up. When the user chooses a service from the menu, the OntologyServiceManager creates a DefaultOntologyTermSelectionListener to listen for when the end-user uses terms selected in the ontology viewer. The OntologyViewer allows the end-user to select terms in some kind of display. The viewer will have some button which will indicate that the user wants to insert selected terms into the form field. This is when the OntologyViewer notifies the DefaultOntologyTermSelectionListener. The DefaultOntologyTermSelectionListener asks the OntologyViewer to supply it with the selected terms. It then asks OntologyViewer to provide an OntologyTermProvenance object for each selected OntologyTerm. The OntologyTermProvenance objects are added to the OntologyContext object which keeps track of what terms have been used in the current form. The OntologyContext in turn submits the OntologyTermProvenance objects to the OntologyTermProvenanceManager, which retains knowledge about all ontology terms used to tag the data set. The DefaultOntologyTermSelectionListener also inserts the label for each selected term into the form text field. When the end-user saves the data set to a native format *.pdz file, the OntologyTermProvenance objects held in the OntologyTermProvenanceManager are written to the *.meta file. This is how Pedro stores information about the ontology terms used to mark-up form fields. 5.5.3 Design History The first services in POSF were simple pull down menus which presented terms to biologists. They would select a term and it would appear in the appropriate text field. The choices came from enumeration types which were described in the schema. As the word lists for some fields grew larger and larger, a new way of providing controlled vocabulary terms to users was needed. 5.5.3.1 Decoupling Controlled Vocabularies from Data Models The expressivity of an XML Schema’s enumeration types was too limited for listing dozens or hundreds of terms, some of which could be related hierarchically to one another. This provided the first compelling reason to make the design for Pedro decouple the schema from mark-up services. The second reason was that large community ontologies such as KEGG, MGED, GO and others evolved independently of one another and independently of whatever data entry model was associated with it. By necessity, the first Architecture Decision became: POSF Decision 1: The data entry schema and the mark-up services will evolve autonomously at different rates. Therefore, decouple these things and support them through separate mechanisms. This meant that Pedro would not be driven by a single monolithic data model which described form concepts as well as the values used to populate the fields. The design had to assume that the data entry schema could be developed before, during or after the development of corresponding mark-up services. The order of development appears to greatly influence what concepts are included or excluded in either kind of model. The scope of terms provided by a single service may not provide an appropriate match for the meaning of a form field. The generality of some form fields could warrant supporting mark-up from more than one service. Conversely, a large existing ontology could be used to populate multiple form fields. Therefore, there is an M:N relationship in mark-up services: form fields. This lead to the next decision: POSF Decision 2: The framework should be able to associate multiple mark-up services with the same form field. 5.5.3.2 Support for Stub Ontologies for Rapid Prototyping Pedro was soon redesigned so that form fields could be linked with text files which contained simple term lists. This feature proved popular with end-users during the rapid prototyping phase of their use cases. By this time, Pedro was being used to rapidly elicit requirements to support data entry activities. Changes made to the data model could be instantly reflected in changes to the forms. This feature allowed end-users to provide feedback on the model by trying to fill in the data entry forms with real data. This allowed non-technical biologists to participate in the modelling process. As part of that process, users would comment on sample key words that were used to fill in a form field. Initially choices for a field might be encoded as enumeration types in the schema. As users suggested more possible terms, the enumerations were removed from the schema and the terms began to be managed in small text files. After awhile, users wanted to see their words as a hierarchy. Pedro’s design was altered to allow a data modeller to choose whether a text form field was associated with a single column text file or a tab indented text file. The important lesson learned from the first attempt at creating ontology services was that there was great value in having the system support simple ontologies which could be evolved through a text editor. Their feedback on the CV could help inform ontology designers how to best manage these bespoke collections of terms in a more sophisticated network of concepts. Architecture Decision 3 became: POSF Decision 3: The framework should support simple stub ontologies that can be used during rapid prototyping activities. 5.5.3.3 Basing the Framework on Identifiers Instead of Word Phrases I thought using inserted phrases was sufficient for marking up form fields but then I began to learn more about using ontologies. The mark-up facility was limited to ensuring that values appearing in the form fields were spelled correctly. Although this ensured that lexical searches would encounter fewer typographical mistakes, the services did not attempt to capture the meaning of the terms. For example, a service could insert the phrase “testosterone” into a field but not record whether it was being regarded as a steroid or a hormone. The ontologists with whom I consulted suggested that the services base functionality on ontology identifiers, not specific word phrases. In the example, an identifier such as http://www.medicalontology.org/version1/1056 could uniquely identify the steroid and http://www.medicalontology.org/version/4567 could uniquely identify the hormone by the same name. If data sets were tagged with ontology identifiers, then ontology services could apply sophisticated reasoning to provide more concise search results or results that were tagged with related terms. The reliance on identifiers also allowed Pedro to support controlled vocabularies in multiple languages such as English, Spanish and French. Each language would have different word phrases for the same concept, but the concept could be uniquely identified by software agents that supported semantic searches. The potential benefits of basing services on ontology identifiers led to the first requirement for a redesigned ontology framework: POSF Decision 4: Base ontology services on ontology identifiers, not word phrases. Each ontology identifier will be associated with a word phrase, and optionally a definition, a URL that may describe a help web page, or an image. 5.5.3.4 Supporting Multiple Formalisms By the first part of 2003, more sophisticated ontology technologies were maturing. DAML+OIL became popular and later this language influenced the development of its successor language OWL. These languages allowed ontology designers a more sophisticated means of organising and relating concepts. Along with the languages came software tools that allowed ontologists to relate domain concepts. The next major Architecture Decision for POSF was to determine whether it was better to support ontologies that used one technology or ontologies that used multiple technologies. Single column and tab-indented text files were proving invaluable for rapid-prototyping efforts so it seemed clear the system should support these data formats for storing ontology terms. However, they seemed limited in their ability to support more sophisticated ways of finding the right terms to use for a form field. The emerging ontology technologies promised more powerful mark-up services that employed automated reasoning to derive new relationships amongst terms. Ontology reasoners could apply constraints to help limit the number and kind of mark-up terms that could be presented to an enduser. The feature could lend an air of artificial intelligence to Pedro which would allow the tool to guide users through data entry. The spectrum of possible ontology formalisms ranged from simple tab indented lists to very sophisticated OWL ontology files. It seemed that if only simple term lists were supported, then data sets would only ever be retrieved as a result of lexical searches. There seemed to be great advantages for using the new ontology technologies. However, I had a number of concerns committing Pedro’s design to them. First, in 2003 and 2004, the DAML+OIL and OWL ontology technologies seemed to be in transition. If core code in Pedro were exposed to aspects of these formalisms, then program maintenance would become dependent on changes that were made to the other technologies. Second, the power promised by ontologies seemed to be matched by the skill level required to create and maintain them. The development of advanced ontologies seemed to involve a detailed knowledge of areas of knowledge such as description logics. During that time, ontology research seemed to be promoted at a few key campuses across the world. The development of ontologies seemed to be centred in a few bioinformatics groups. I concluded that knowing how to build advanced ontologies was a craft that would lie outside the domain of expertise for staff on a typical bioinformatics project. Many labs would therefore either have to invest resources training their own people about ontologies or outsource expertise on this topic to a few research centres. Manchester happens to be one such centre of excellence on ontologies and they made themselves available for me to ask questions of them. However, I felt that the tool would enjoy a broader uptake in the community if the software minimised its dependencies on institutional products and specialised technologies. Different groups would want to choose which formalism they wanted. The choice of technology could reflect legacy needs, biases towards emerging technologies or different levels of effort spent learning how to use one formalism or another. There also seemed to be a need to support services that were suited for rapid prototyping or for production purposes. These observations led to the next Architecture Decision: POSF Decision 5: Support multiple formalisms. Do not limit support either for very simple or very sophisticated ontologies. Supporting multiple formalisms necessitated the development of interfaces which would allow Pedro to interact with multiple mark-up services in a uniform way. The interfaces would shield the main application from details about how ontology terms were managed or related. The benefit of this approach is that data modellers can substitute ontology services without affecting the design of the data entry schema or the rest of the program that renders it as forms. Using an adapter design pattern, Pedro could interact with services which relied on something as simple as a single column text file or as sophisticated as an OWL file. 5.5.3.4 Decoupling Aspects of Model and View in an Ontology Service It was apparent from the early stages of the Pedro project that the people who maintained ontologies had different needs than the people who used them. Ontologists use a variety of software tools to build their ontologies. They may edit text files with WordPad, make acyclic graphs of terms using DagEdit, or create other ontologies using tools such as OWLEditor. Each tool may store terms in a different file format. A generic interface for mark-up services had to account for the different sources which can provide terms. The interface would be designed for the benefit of people maintaining the ontologies. Visualising those terms is a separate concern. Ontology terms can be presented to users in a number of ways including a list, a tree, a table or some other form of graphical display. The interface would also account for the different ways of rendering terms and be designed for the benefit of people using the ontologies. This lead to the next architecture decision: POSF Decision 6: Let each ontology service comprise one or both an OntologySource and an OntologyViewer. Each of these objects is described by an interface. An OntologySource provides terms and is designed on behalf of those who maintain ontologies. An OntologyViewer renders terms provided by the OntologySource, and is designed on behalf of those who use ontologies. The ontology service may be configured to mix and match an OntologySource with an OntologyViewer. 5.5.3.5 Consider Local and Remote Ontology Sources Some community groups such as MGED post the latest version of their ontology on a web site. In other use cases, ontologies are locally maintained word lists. To support both cases, the services would have to consider collections of terms that are maintained locally or remotely. The next decision became: POSF Decision 7: make the design of an OntologySource consider whether terms are maintained locally or remotely. 5.5.3.6 Accommodate Updating in Ontology Sources Following on from the previous design decision, it is reasonable to expect that an ontology source could become outdated. The framework needed some mechanism for asking an OntologySource whether it needed to be updated. In some cases, updating could be done automatically to allow Pedro to present users with the latest terms. In other cases, a local ontology could contain locally evolved terminology or it could present the version of an ontology that a laboratory was most confident in using. Here, automatic updates should not be done but be left to the discretion of the end-users. This lead to using: POSF Decision 8: the framework should provide some way of determining whether an OntologySource needs to be updated. End-users should be able to decide whether the ontology service updates itself. 5.5.3.7 Provide Meta Data about Ontology Services As ontologies evolve, it is important to keep track of what versions were used to tag data sets. Therefore: POSF Decision 9: require ontology services to provide meta data information about the ontologies. This information should include the name, author, version, description and kind of formalism supported by an ontology. A couple of years ago, I inquired about what standard there was for describing aspects of an ontology service. From my investigation it seemed like the ontology community didn’t have a clear idea of what kind of meta-data should be gathered about a term or a service. This led me to guess what kinds of attributes should be recorded for each term. Because ontology identifiers aren’t meaningful to end-users, there wasn’t a reason to include them when terms were inserted into form fields. However, the information had to be maintained somehow. 5.5.4 Scope of Effect The Pedro Ontology Services Framework is used in all of the Pedro tools created thus far including the desktop Pedro application, the Tablet Pedro application, the Pedro Configuration Tool, the Pedro Meta Data Editor, the Pierre Configuration Tool and most of the search and retrieval applications generated by Pierre. Within the Pedro code base, POSF features are referenced in pedro.desktopDeployment.TextFieldView and pedro.tabletDeployment.TextFieldView. 5.5.5 Relevant Code Packages The packages relevant to POSF include: pedro.soa.ontology.sources pedro.soa.ontology.views pedro.soa.ontology.provenance The schema for ontology services is defined in the ./models/pedro_form_configuration/model/pedro_form_configuration.xsd file. The meta data retained for each ontology term is defined in ./models/pedro_meta_data/model/PedroMetaData.xsd file. Examples of implementations of ontology services can be found in the “ontology” model folder that comes with the download. You can find it under the “./other_models” folder” 5.6 ID Generator Services 5.6.1 Purpose to provide an identifier that can be used to populate an identifier field. 5.6.2 Description The IDGeneratorService interface is described in Figure 4-11. The interface extends ServiceClass in the same way that OntologySource, OntologyViewer, DocumentValidationService, RecordValidationService and other service classes do. Figure 5-11: the IDGeneratorService IDGeneratorService has two main methods. The generateKey(...) method is used to generate a String value that Pedro inserts into the attribute form field. excludeKey(...) is used when a data file is read. ID values found in existing records are excluded so that the generateKey(...) doesn’t produce a key which already exists in the data set. 5.6.3 Design History Many of the use cases in bioinformatics use identifiers to uniquely label experiment records. For example, if a data set describes a number of samples, they would each get a unique identifier so they could be processed better by analysis programs. Unlike ontology services, identifiers don’t have a semantic value. However, identifiers may have naming conventions which use domainspecific phrasing. For example, a unique identifier for a sample might include the name of the laboratory where the work was done. Pedro needed some kind of service which could generate unique keys. This was the incentive for developing the IDGeneratorService interface. 5.6.4 Scope of Effect ID Generator services are only ever used for attribute fields. 5.6.5 Relevant Code Packages IDGeneratorService is defined in soa.id.IDGeneratorService. It will appear in pedro.desktopDeployment.IDFieldView. The class name for an IDGeneratorService stored in pedro.mda.config.AttributeFieldConfiguration. will be 5.7 Plugins 5.7.1 Purpose to allow developers to extend the functionality of the data entry with code modules that perform domain-specific tasks. 5.7.2 Description Pedro supports plugins that can have a scope of effect for the current field, the current record or the current document. To make plugins, developers must create a Java class which implements pedro.soa.plugins.PedroPlugin. This interface is shown in Figure 4-1: Figure 5-12: a Java class implementing the PedroPlugin interface. getDisplayName() getDescription() provides the name used to advertise the plugin in a menu, button or list. returns a description of what the plugin does. isWorking() is a diagnostic method used to help Pedro determine whether a plugin should be included as a service that can be used by the end-users. isSuitableForRecordModel(...) is used to help limit the use of the plugin. It takes as arguments a model stamp that describes the version of the schema and a record class name, which indicates the record type of the currently displayed record. Both parameter values are supplied automatically by the system. Plugin developers can use the information to determine whether it is appropriate for the tool to register the plugin to suit the current data entry task. Plugin developers can make their plugin classes implement other marker interfaces such as AnalysisPlugin, DataExportPlugin and DataImportPlugin. Pedro uses the information to produce a summary of different available plugins; the results are written to the status bar of a Pedro dialog. To ensure that their plugins are detected by the tool, developers must follow these three steps: 1. produce a JAR file containing the plugin classes 2. rename the extension of the file from *.jar to *.plugins. 3. move the jar file in the “lib” directory of the model folder. Plugins are associated with different parts of the application via the Pedro Configuration Tool. When the data entry tool is running, plugins may appear in different places. Document-level plugins will appear in one or more of the menus in the menu bar. If plugins are associated with a given record type, then a “Plugins...” button will appear flushed top-right in the main form whenever end-users are editing an instance of that kind of record. If plugins are associated with a field, the same button will appear at the end of the form field. 5.7.3 Design History There are two significant differences between plugin systems developed for Pedro v1.9 and Pedro v2.0: Pedro 2.0 plugins can have a field, record or document-level scope of effect. Previous releases only supported document and record-level plugins. Document-level plugins would only appear in Pedro’s File Menu, and would have to implement a special “RecordImporter” interface. The new plugins also have a different execution method. Pedro 1.9 plugins used a process(...) method whose parameters were appropriate for the desktop deployment but not the tablet deployment of the data entry tool. The method was renamed “execute(...)” and passed a single pedroFormContext parameter which could hold as many other parameter values as developers wanted. To obtain values for the parameter values passed in process(...), follow this code example: RecordModelFactory recordModelFactory = (RecordModelFactory) pedroFormContext.getApplicationProperty(PedroApplicationContext.RECORD_MODEL_FAC TORY); NavigationTree navigationTree = (NavigationTree) pedroFormContext.getDocumentProperty(PedroDocumentContext.NAVIGATION_TREE); RecordModel currentRecordModel = (RecordModel) pedroFormContext.getProperty(PedroFormContext.CURRENT_RECORD_MODEL); 5.7.4 Scope of Effect Plugins are associated with menus, records and fields via the Pedro Configuration Tool. Providing that plugins implement the PedroPlugin interface, changes in customised services shouldn’t effect the rest of the code base. 5.7.5 Relevant Code Packages The Pedro plugin classes are defined in the pedro.soa.plugins.* package. Examples of Pedro plugins can be found in the pedro.configurationTool.* package, which features plugins used in the Pedro Configuration Tool. 5.8 Configuration System 5.8.1 Purpose to manage options for configuring a data entry application that are not expressed in the XML schema. 5.8.2 Description Pedro interprets the XML Schema to determine properties of the data entry application. However, many of the configuration options can’t be expressed in the XML Schema language, so they are managed in a ./[model]/config/ConfigurationFile.xml. This file maintains a collection of mappings that link schema concepts to different kinds of properties. The Pedro Configuration System manages this file and provides configuration options to the rest of the system. It has three main aspects: Pedro Configuration Tool ConfigurationReader class data structures used to manage configuration data 5.8.2.1 Pedro Configuration Tool The Pedro Configuration Tool is an instance of the Pedro tool which has been configured with plugins that suit configuring data entry applications. The tool has its own separate tutorial; discussion here will be limited to describing the code that provides the functionality. Most of the classes relevant to this discussion appear in the pedro.configurationTool.* package. The main class for the tool is pedro.configurationTool.PedroConfigurationTool. The Pedro Configuration Tool runs off a schema defined in the ./models/pedro_form_configuration model folder. This model describes records such as “RecordModel” and “EditField” which hold configuration data and correspond to Pedro’s native data structures. For example, an EditField record has a field called “help_link”, which specifies a URL for a page that is displayed for context-sensitive help. The “EditField” record corresponds to Pedro’s pedro.mda.model.EditFieldModel data structure. For some of the configuration records, data modellers have to provide the name of a record or field in a target schema. For example, the previously described “EditField” record may have “patient_name” for the value of the field name. This is an example of a reference which is used to link the edit field “patient_name” found in the target schema with configuration properties such as “help_link” described in the ConfigurationFile.xml file. To help reduce data entry errors, the Pedro Configuration Tool has a number of ontology services and plugins which help automatically populate configuration records with this linking information. However, in order to support this activity they need knowledge of the records and fields that appear in the target schema. To obtain this information, PedroConfigurationTool prompts the end-users to select a target model folder. This folder is expected to contain an XML Schema. However, it is not expected to contain an existing configuration file because this is what the Configuration Tool is supposed to produce. The tool reads the model folder containing the target schema and holds information about it in a special Pedro context object. This object is then stored with the key “TARGET_SCHEMA_APPLICATION_CONTEXT” within the configuration tool’s own PedroApplicationContext variable. The tool’s plugins use information held in this object to help fill in the linking information expected in the configuration records. The plugin pedro.configurationTool.DefaultConfigurationFileCreationPlugin provides a good example of how information in the target schema application context is used to fill in configuration records. 5.8.2.2 ConfigurationReader Data modellers use the Pedro Configuration Tool to produce the file “ConfigurationFile.xml”, which describes all the application properties that are associated with XML Schema concepts. When the Pedro application starts, it reads the configuration file and holds the information in instances of data container classes. These classes have properties which are analagous to classes defined in the XML Schema for the Pedro Configuration Tool (See Appendix A). Various parts of Pedro use the ConfigurationReader to obtain configuration data that are used to render the application. For example, pedro.desktopDeployment.TextFieldView uses the ConfigurationReader to determine whether it should link an edit field defined in the domain schema to ontology services. In another case, Pedro’s menu classes use the ConfigurationReader to determine which standard menu items should be included for display. 5.8.2.3 Other Configuration Files Pedro maintains two other configuration files in the config directory of a model folder: SessionAspects.xml FileExtensionsToLaunch.xml The SessionAspects.xml file contains information about the most recently accessed files, and is managed by the class pedro.mda.config.SessionManager. Pedro uses a list of recently accessed files to create the “Favourites” sub-menu located in the File Menu. It also uses session information to set the default starting directory for when users open files. FileExtensionsToLaunch.xml associates file extensions with shell commands used to launch other software applications. The mappings are used when users press the “View” button on a URL Field View. If the file specified in the text field ends with an recognised extension, Pedro will try to launch an application by making a system call. It remains the only configuration file that end-users still have to configure. Currently, the SessionManager uses the class FileLauncher to parse the file of mappings. In future, this file will be eliminated in favour of having the same information represented as properties in the Pedro Configuration Tool. 5.8.3 Design History Prior to Pedro 2.0, data modellers had to craft the ConfigurationFile.xml file by hand. The file was awkward and time-consuming to maintain, especially when large XML schemas were being used. This led to the idea of using Pedro to edit its own configuration files. The Pedro Configuration Tool was developed using a schema of configuration properties and a collection of plugins which helped data modellers fill in the forms. The advent of the Pedro Configuration Tool meant that a configuration file could be designed far more rapidly than it could in previous releases. Enshrining configuration options in an XML Schema also meant it was easy to add new properties. The configuration options for Pedro expanded to include the selective inclusion of menu items and services which could be associated with a field, record or document scope of effect. With new features came data container classes that held the property values in memory. The expansion resulted in a configuration file that can accommodate 59 more options. The only drawback for enhancing the configuration system is that configuration files developed in Pedro v1.9 are not compatible with those made in Pedro v2.0. The configuration files produced by the Pedro Configuration Tool and the Pierre Configuration tool share many of the same properties that are associated with concepts in a target schema. To allow Pierre to re-use Pedro code, the configuration reader classes for each tool were made to implement a pedro.mda.config.SchemaConceptManager interface. Some of the calls to ConfigurationReader that once appeared in the Pedro code base have been replaced by calls to SchemaConceptManager. 5.8.4 Scope of Effect The Configuration Reader is referenced in the file menu classes to determine what features should be included. Many of the classes which generate form fields interact with the ConfigurationReader via the SchemaConceptManager interface. 5.8.5 Relevant Code Packages The code for the PedroConfigurationTool can be found in the pedro.configurationTool.* package. Code for ConfigurationReader and the data container classes can be found in the package pedro.mda.config.*. 5.9 IO 5.9.1 Purpose to store and retrieve form data managed by Pedro 5.9.2 Description Pedro normally stores a data set as a zipped file ending in a *.PDZ file extension. The zipped file contains a number of XML files, each of which represents a layer of information. Currently there are two layers: the data layer and the meta-data layer. The data layer is represented by the .PDR file and contains the text that would appear in form fields. The tags found in the data layer will be defined in the target schema used to drive the data entry application. The meta-data layer is represented by the .META file and contains meta data about the data set, including basic information about the author and about all the ontology terms which were used to mark-up form fields. The tags found in the meta-data layer are defined in the schema: ./models/pedro_meta_data/model/PedroMetaData.xsd. The IO system for creating PDZ files can be extended to include other information layers (see Design for Extensibility section). Pedro can export a data set as an XML file which will only contain information from the data layer. This export feature appears in the “Export to Final Submission Format” menu option but will probably be relabelled something more appropriate in the future. 5.9.3 Design History 5.9.3.1 Use of Layers Originally, Pedro stored a data set as a single XML file. The need to store a data set as a collection of layers arose with the development of ontology services. Initially, ontology services provided text phrases that would be pasted into forms. However, an ontology term is not adequately represented by a word phrase. Ontology terms were eventually redesigned to use a human-readable label and a machine-readable identifier (see Section 4.5). Although the labels for selected ontology terms were stored into form fields, Pedro needed some mechanism for storing information about the unique identifiers. Initially, ontology terms were written as hyperlinks in the XML data file. They were stored in the form: <a href=”[unique_identifier]”>label</a> The problem with this approach was that data sets marked up with ontology terms would fail to validate against the XML schema. This was because the schema would not describe the “<a>” tag which appeared within the tags for a form field. Rather than treating “<a>” as a tag with special significance, I decided to store ontology term identifiers in a separate meta data file that would accompany the data file. Pedro was modified so that its data sets were stored in ZIP files that contained multiple information layers. Figure 5-13 illustrates the structure of a native format *.pdz file. Figure 5-13: the structure of Pedro’s native format *.PDZ file. It contains a *.pdr file which holds the form data and a *.meta file which holds the meta data about the data set. 5.9.3.2 Changing Parsers Pedro used to rely entirely on the DOM parser and still uses it for parsing the meta data file. The DOM parser works by parsing an XML file and producing an in-memory tree of DOM model objects. The API for DOM objects made it easy to extract information from the XML file. The parser performs well with small data sets but exhibited performance problems when it was used to process large data sets. This is because the parser loaded an entire XML file into memory before the DOM objects could be used. The application experienced great performance gains in reading files when some of the I/O classes began using the SAX parser. 5.9.3.3 Support for Streams Pedro IO files were modified so they could accept data streams instead of just files. This was done to make it easier to deploy Pedro as a component rather than as a standalone application. In a component mode of activity, Pedro may receive its data input as a stream coming directly from another component. 5.9.3.4 Creating the “Export to Final Submission” Feature Pedro used to allow end-users to export native format *.pdz files to *.xml files that only contained the data layer of information. The *.xml files tended to be candidate files for submission to data repositories. I thought it was a good idea to rename this format to “Export to Final Submission Format” and cause the menu feature to validate the document. If there were any errors, Pedro would not create the *.xml file. This action ensured that end-users fixed all the errors before they sent their files off to repository managers. 5.9.3.5 Providing Support for the Meta Data Layer For most of its development cycle, Pedro has saved an arbitrary collection of meta data in the *.meta layer. Typically this focused on recording which ontology terms were used to tag a particular kind of schema concept such as a form field or record. Whenever a new attribute was added, it resulted in changes made to special I/O routines which read and wrote meta data records. In 2007, the *.meta layer was given its own distinct schema for meta data. Each Pedro tool now loads the pedro_meta_data model and uses a special context variable (See Section 4.3) to help read and write meta data records. These records are maintained independently of the form data end-user edit through the normal use of the tool. A new utility has been designed which will allow data curators to edit just the meta data layer of a given *.pdz file. The Pedro Meta Data Editor uses the same pedro_meta_data model but allows curators to post-annotate a *.pdz file. Curators can now remove ontology terms that were used to tag records and fields. Alternatively, they can add more terms using the same ontology services that are available to end-users. With the new support for the *.meta layer, data curators can change the meta data about a data set without editing the data themselves. The layers can be maintained completely independent of one another. 5.9.3.6 Merging dataImport and IO Class Packages Pedro v1.9 had separate packages for classes that managed Pedro data files and those that were used to import to or export data from spreadsheets. Now all of the classes in the pedro.dataImport.* package have been moved into the pedro.io.* package. 5.9.4 Scope of Effect Most of the I/O classes are defined in pedro.io.* package. Whereas the meta data used to be managed by pedro.io.MetaDataReader and pedro.io.MetaDataWriter classes, meta data records are now written using the normal PedroDataFileReader and PedroDataFileWriter classes respectively. Most of the IO packages are called in the pedro.desktopDeployment.FileMenu or pedro.tabletDeployment.FileMenu classes. 5.9.5 Relevant Code Packages The IO classes appear in pedro.io.*. PedroDataFileReader/Writer are used to manage the .PDR files that represent the data layer of each data set. NativeDataFileReader/Writer uses these classes when it manages the zipped .PDZ files. XMLSubmissionFileReader/Writer wraps PedroDataFileReader/Writer and produces .XML files. 5.10 Alerts 5.10.1 Purpose to provide a means of allowing end-users to compile their own lists of errors, warnings and tips that can be used by other researchers in the community when they validate their data sets. It is intended as an extensible way to enhance validation facilities of the tool and a passive way that researchers can communicate with each other. 5.10.2 Description Pedro supports an Alerts System to supplement the tool’s standard validation facilities and to allow end-users a way of using advice provided by other end-users. An alert is a set of matching criteria which identifies patterns of field value combinations in a record. The matching criteria can be associated with one of four intents: error warning information bulletin request for the user to contact someone else. Domain experts can use the Pedro Alerts Editor to create a collection of alerts called an Alert Bundle. The bundle is a ZIP file that contains a small XML file to represent each alert. The alert bundles can be included as part of the release of a new model folder, or they can be hosted at some URL. Other end-users can import these alert bundles and use them in two situations: they attempt to export the document to a final submission format. user uses the “Show Errors” feature in the View Menu When either of these actions are taken, the tool scans the current document and identifies any records which match an alert. The results are included with any errors the system identifies in the document. Should any of the alerts represent errors, then the task of exporting data to a final format will fail. 5.10.3 Design History The Alert system was originally developed because Pedro had no way of identifying errors which were due to combinations of field values found within the same record. Individual field values could be validated but they could still present errors when they were considered in conjunction with other field values on the same form. For example, a patient record form could have fields “gender” and “cancer type”. “male” and “ovarian cancer” could represent legal values for their respective fields but represent an illegal combination of values. The system was generalised to include warnings and other kinds of messages that might prove useful in an activity of standardised data entry. Now that Pedro supports field validation services and general plugins at the field, record and document levels of data entry activity, it is unclear whether this system will become more popular. The Pedro Alerts system is limited in that it will only identify patterns of values found within the same record. However, the system allows end-users to enhance the tool’s validation capabilities without having to code plugins. This benefit may prove important in settings where there is a scarce availability of software developers to make validation plugins. Pedro’s validation package has been modified so that validation routines return a collection of alerts rather than a String that could contain an error message. This was done to allow validation plugins to return results that reflected different kinds of data quality. 5.10.4 Scope of Effect The Alerts package is becoming more intertwined with the Validation package. Validation services described in the pedro.soa.validation.* package are required to return a collection of Alerts when they validate a field, record or document. Many of the error messages thrown by parts of the system are encapsulated in a SystemErrorAlert(..) object, which is an instance of an Alert. 5.10.5 Relevant Code Packages Most of the classes used to support alerts functionality are found in the pedro.soa.alerts package. The Alerts Editor can be run by invoking pedro.desktopDeployment.PedroAlerts pedro.soa.alerts.* 5.11 Meta Data System 5.11.1 Purpose to capture and isolate meta data about a document. The meta data forms a summary view that can be used for simple search and retrieval operations supported by data dissemination systems. Pedro was designed to accommodate large data files in bioinformatics. A layer of meta data was added to the native format *.pdz files so that data dissemination systems could interpret a small meta data file before having to find search criteria in the much larger data layer. 5.11.2 Description Pedro has an internal system for maintaining meta data about documents. The kinds of meta data that are managed include: summary information including the title, author, e-mail, institution and description of the data set the number of each kind of record that appears in the data set provenance data about all the ontology terms that are used to mark-up form fields Some of this information is provided by the end-users. Figure 4-14 shows the dialog that appears when they select the “Describe this document” feature from the Options menu. The title, author, email, institution and description values are saved as part of the meta data for the document. Figure 5-14: the meta data dialog that appears when end-users select “Describe this document” option in the Options menu of a Pedro dialog. When end-users use the dialog to create a summary of their data set, the information is stored in the *.meta layer. The remaining meta data are captured automatically. Pedro monitors how many instances of each kind of record appear in the document. When end-users use ontology services to mark-up form fields, the tool asks the services to provide provenance data about all the selected terms. These data are also stored as meta data. Meta data are stored as an information layer within the *.pdz native file format (see Section 4.9). The layer is expressed as an XML data file that is defined by a schema. Using the Pedro Meta Data Editor, data curators can edit the meta data file independently of the form data. They can edit summary information or ontology terms to suit changes in the way documents are classified in a data repository. The following sections describe aspects of the meta data system in more detail. 5.11.2.1 Walkthrough for Capturing Ontology Term Meta Data The process of committing meta data about ontology terms to file begins when an end-user right clicks on the label of a form field that supports ontology services. This walkthrough traces the activity in the desktop deployment of the tool. If an instance of pedro.desktopDeployment.TextFieldView has been linked with ontology services, the object associates its starred form label with pedro.soa.ontology.views.OntologyServiceManager. This class is responsible for presenting the available ontology services to end-users and ensuring the selected terms appear in the text field. OntologyServiceManager delegates the task of listening to right-click mouse actions to a ServiceMenuListener class. When the ServiceMenuListener detects a right-click over the form label, it causes the OntologyServiceManager to generate a popup menu of available services. If a service has less than 40 terms, it attempts to render them as menu items. Otherwise, it displays a “Select terms...” buttons which causes an OntologyViewer to display the terms. When an ontology service is selected, the OntologyViewer is associated with an OntologyTermSelectionListener. When end-users have selected terms for mark-up, the viewer notifies the OntologyTermSelectionListener. The listener is then supposed to ask the viewer to return meta data about each term that has been selected. OntologyViewer obliges by returning a collection of OntologyTermProvenance objects. Pedro makes use of a DefaultOntologyTermListener which performs the mark-up action. It adds the OntologyTermProvenance objects to OntologyContext, which maintains information about the content of fields that are currently displayed. OntologyContext in turn adds the provenance objects to OntologyTermProvenanceManager. This object maintains information about all the ontology terms that have been used to mark-up the whole document. The OntologyTermProvenanceManager is owned by a DocumentMetaData object, which holds information about meta data for the whole document. This is the object that is used by NativeFileFormatWriter and NativeFileFormatReader to serialise the meta data information to a *.meta XML file. 5.11.2.2 The Pedro Meta Data Editor The Pedro Meta Data Editor is an instance of Pedro that has been customised to edit the meta data layer of *.pdz files. The forms for the tool are generated by the schema described in Appendix C. The code used to make the plugins is explained more in Section 9.7. The Meta Data Editor allows data curators to alter the meta data layer independently of the data layer. The editor is designed to let them annotate the document with terms that come from the same ontology services that are available to a regular end-user. 5.11.3 Design History Section 5.9 describes much of the design history that led to developing code to support document meta-data. Up until Pedro v1.9, the meta data were maintained automatically by the tool and there was no simple way that a data curator could edit the file. The activity of gathering meta data about a document would depend on how often the end-users would make use of ontology services. It became clear that Pedro should support the needs of data curators. We observed that most endusers wanted to minimally fill in a document so they could fulfil requirements set out by journal publications. However, data curators were concerned with ensuring that documents were sufficiently tagged so that they could be detected in search operations applied to a data repository. The recognition of a data curator as a new kind of user led to the development of the Pedro Meta Data Editor. 5.11.4 Scope of Effect Pedro’s meta data classes are used extensively by the ontology services and are used by the native file format I/O classes to create the *.meta layer in each *.pdz file. 5.11.5 Relevant Packages Most of the meta data classes appear in the pedro.metaData.* package. The OntologyTermProvenanceManager used to manage meta data about ontology terms appears within the pedro.soa.ontology.provenance.* package. 5.12 Form Generation Facilities 5.12.1 Purpose to generate UIs for Pedro which suite displays on desktop and Tablet PCs. 5.12.2 Description Pedro’s architecture maintains a rigid separation between the structures that hold data and structures that present them to an end-user. The model aspects are represented by the native data structures described in Section 5.2. The view aspects are represented by the following packages: pedro.desktopDeployment.* pedro.tabletDeployment.* 5.12.2.1 General Classes for Generating Desktop Pedro Forms Figure XXX describes the major classes used to render forms in Desktop Pedro. When end-users invoke the “run_pedro” script, it spawns an instance of PedroApplication. The object prompts end-users to load a model and it produces template record definitions that can be used to populate a document. PedroApplication creates an instance of an empty PedroDialog, which is the main window for the data entry tool. Each PedroDialog has a PedroMenuBar containing a variety of menus subclassed from PedroMenu. PedroMenu contains general-purpose code for handling plugins. Each PedroDialog will also have a NavigationTree, which is the tree display that appears on the left part of the window. The manages a tree of NavigationTreeNode objects that mirror the tree of RecordModel objects which comprise the document. The third major UI component of the PedroDialog is the RecordView object that displays the form fields for the currently selected record in the NavigationTree. The RecordView comprises a RecordViewTitle which appears flushed left at the top of the panel, and a collection of DataFieldView objects which render form fields. Each DataFieldView object uses a corresponding DataFieldModel object that is part of the currently selected RecordModel object. Figure XXX: major classes used to generate forms for Desktop Pedro. 5.12.2.2 Classes for Generating Edit Fields in Desktop Pedro Forms Figure XXX describes a more detailed view of UI classes that are used to render text fields, identifier fields, combination box fields and radio fields. EditFieldView contains code which can render properties of a corresponding EditFieldModel object. It is responsible for rendering a “Plugins” button for any form field that has been associated with a collection of plugins. The other field view classes use properties defined in corresponding edit field model objects. RadioFieldView uses the choices provided by GroupFieldModel to render a group of radio buttons. If GroupFieldModel provides more than three choices, Pedro uses a CombinationFieldView to render the items as a drop-down list. URIFieldViews use TextFieldModel objects whose XML Schema definitions used “xs:anyType” for the type attribute (See Appendix B). IDFieldView uses an instance of IDFieldModel to render a text field that is accompanied by a “Generate Key” button. The identifier service used to provide an identifier is a property of the IDFieldModel. Figure XXX: Classes responsible for rendering edit fields in Desktop Pedro 5.12.2.3 Classes for Generating List Fields in Desktop Pedro Forms Figure XXX shows classes responsible for rendering list fields. A ListFieldView comprises a ListTypeManager, a ListValueManager and it uses an instance of ListFieldModel. ListTypeManager manages the type of child record that will be created or edited when end-users press “New” or “Edit” buttons. There are implementations of ListTypeManager which accommodate lists that support one or multiple types. MultiListTypeManager renders a combination box populated with the kinds of records that can appear in the list field. ListValueManager is an abstract class that manages the display of list items. SingleListValueManager represents a one item list as a single non-editable text field that shows the display name of the child record. MultiListValueManager shows the child records in a scrollable list of record names. List fields are created by the pedro.desktopDeployment.RecordViewFactory class. It determines the type of list field to produce by inspecting the fieldViewType attribute of the ListModel object. Figure XXX: Classes responsible for rendering list fields in Desktop Pedro. 5.12.2.3 Classes for Generating Forms in Tablet Pedro TabletPedro was designed to support the same data models as those used to run Desktop Pedro. However, we envisioned that the different forms of deployment would be used at different stages of editing the same document. End-users will tend to use TabletPedro to do simple data entry tasks which require them to be at a work site such as a laboratory or a remote field location. They will tend to use Desktop Pedro to support complex data entry tasks and to fill in the rest of the document. TabletPedro was designed with the following principles in mind: minimise the feature set of the application to support essential tasks minimise the use of pop-up dialogs economise on screen real estate change features which rely on right-menu clicks. The classes used to generate forms in Tablet Pedro are in the pedro.tabletDeployment.* package. Many of them share the same name as other classes that cater to supporting Desktop Pedro. There are a few notable differences. First, some of the menu items have been removed. For example, the File Menu does not contain an “Export to Final Submission Format” button. This feature was deemed non-essential because it was likely this would be done with the Desktop version. To reduce the number of pop-up dialogs, some features were altered so they used a stack of screens instead of separate windows. For example, consider the “Window” feature in Desktop Pedro. When end-users switch from one file to another, a new window grabs focus. In Tablet Pedro, only one file is shown at a time. When end-users change the current file, it changes the file loaded in the current window. As another example, the DefaultOntologyViewer was altered so it relied on JPanel objects instead of JDialog objects. This was done so the ontology viewer was more easily embedded in a stack of windows. To economise on screen real estate, the NavigationTree was removed. It is accessible via the “WhereAmI” button, which pushes a view of the NavigationTree onto the stack of windows. TabletPedro supports navigation via a RecordStack object. RecordStack is a drop down list that shows the currently edited branch of the tree. Finally, the right click mechanism for activating ontology services is replaced by pressing a “Markup” button which appears at the end of a text field. 5.12.3 Design History Initially, Pedro tightly coupled data objects with the objects that viewed them. This led to severe performance problems because Java would potentially be managing thousands of UI components, each using a collection of Swing objects. This led to stripping the native data structures of references to UI components. Eventually the production of form field views was centralised in the class pedro.desktopDeployment.RecordViewFactory. The most significant change in the form generation classes came when Tablet Pedro was developed. Initially we wanted to run Pedro on a PDA. However, these devices often require a specialised form of the JVM which does not support Swing components. Although the native data structures were stripped of references to UI components, they still had one crucial reference to the swing libraries: the ChangeListener class. We realised it would not be easy to port the code base to a PDA platform so instead we decided on a Tablet PC platform. It became clear that we had to make some changes to the forms. The NavigationTree was taking up too much room in the display and it was difficult to tap a pen on some of the nodes. Too many windows popped up and they cluttered the screen area. We also observed that it was difficult to activate ontology services. In the Tablet display, a right-click action is done by tapping and holding the pen on a form label. This seemed too awkward to do, so a “Mark up” button was developed instead. The development of TabletPedro took 3 consultation meetings with mass spectrometer scientist Jennifer Lynch and ten business days of coding. The result proved that Pedro’s design isolated its model aspects enough to develop new ways of visualising them. 5.12.4 Scope of Effect The UI classes are probably not used by anything other than classes in Desktop and Tablet deployment packages. 5.12.5 Relevant Code Packages The classes used to generate user interfaces in Pedro are in the pedro.desktopDeployment.* and pedro.tabletDeployment.* packages. 6 Extending the Core Code Base This section describes ways that developers could extend the core code base. The following subsections represent tasks which may be part of future enhancements or may represent common requests within the Pedro developer community. 6.1 Replacing the schema parser Pedro communicates with the schema parser via the SchemaReaderInterface, which can be implemented to rely on XML Schema or other data modelling technologies. A few years ago, we tried to make Pedro run off a data model created by a knowledge acquisition system called PCPack. It was able to generate forms for simple models and proved that the application could be insulated from changes in the way models were read. To adapt Pedro to suit another model interpreter: 1. create a Java class that implements the pedro.mda.schema.SchemaReaderInterface interface. 2. develop code to parse the model file. 3. use the model properties extracted from the model to build up definitions of RecordModel objects (see pedro.mda.model package and Appendix B) 4. register the RecordModel objects as templates in the RecordModelFactory object. These templates are used to instantiate records created in the program. 5. change code in the constructor of pedro.mda.schema.Startup so that it instantiates a copy of the new schema reader instead of MSVSchemaReader The most important part of this activity is to be able to map model concepts to attributes in RecordModel, EditFieldModel, ListFieldModel, IDFieldModel, GroupFieldModel and DataFieldModel classes found in the pedro.mda.model packages. 6.2 Adding an extra data layer Pedro currently has two data layers inside each native .pdz file: a .pdr file which holds the data that appears in the data set a .meta file which holds meta data about the file. Most of the data held in this file represent the ontology terms which were used to mark up record fields. The information layers are described in more detail in the I/O System described in Section 4.9. In the future, native files may contain additional layers which describe data quality or aspects of provenance. The way the .meta layer was developed provides a good example of how a new layer of information could be supported. For more information on this, please see Section 4.11, which deals with the design of Pedro’s Meta Data sub-system. 1. Develop data container classes which will hold the information you want to maintain. 2. Create a manager class that manages the data container classes 3. Make the manager accessible to other parts of Pedro by registering it as a new variable in the pedroFormContext variable. You may have to add this change to classes such as pedro.desktopDeployment.PedroApplication and others that have a main method. 4. Use the hash key you created in the previous step to access the variable from within pedro.io.NativeFileReader and pedro.io.NativeFileWriter. 5. Decide on a file extension to use for the layer, eg: “*.provenance” 6. Access the manager object and use it to read and a layer file ending with the extension you wanted. The new layer should appear in *.pdz files. If you want this layer to be editable by people, then it is good to look to the design of the Meta Data Editor for guidance: 1. write an XML schema which describes the information you want to maintain. 2. Develop plugins which will load and save data for only the information layer you want. You may have to replace the default File menu features “Open”, “Save” and “Exit” because these are designed to open and save a *.pdz file, not a specific information layer within a *.pdz file. 6.3 Creating a new field view TabletPedro provides an example of how a new field view can be developed. The application uses pedro.tabletDeployment.TextFieldView instead of pedro.desktopDeployment.TextFieldView when it renders forms. The steps: 1. Make a class which extends desktopDeployment.EditFieldView and implements desktopDeployment.CustomisedFieldView. (eg: tabletDeployment.TabletTextFieldView) 2. use the desktopDeployment.RecordViewFactory class to associate some field view type with the class you’ve written. (see the constructor for tabletPedro.DataEntryPanel) 6.4 Adding Form Properties The advent of the Pedro Configuration Tool has made it easy to extend the system’s capabilities of recognising new configuration attributes. 1. Add new fields to the configuration data model (see the models/pedro_form_configuration/model/pedro_form_configuration.xsd) 2. adjust pedro.mda.config.PedroConfigurationReader so that the parser can detect XML tags of the new attribute. 3. You may have to add new set/get routines in any number of the configuration data structures appearing in pedro.mda.config.*. 4. Decide where in the code base you want to access the new configuration attributes. You will access the PedroConfigurationReader using the code such as: PedroConfigurationReader configurationReader = (PedroConfigurationReader) pedroFormContext.getApplicationProperty(PedroApplicationContext.CONFIGURATION _READER). EditFieldConfiguration editFieldConfiguration configurationReader.getConfigurationRecord(recordClassName, fieldName) ??? = editFieldConfiguration.getX() = (EditFieldConfiguration) In the code example, the “X” in “getX()” represents the new configuration attribute you want to use. You are most likely going to use the code snippet in plugins. However, many parts of Pedro have access to the pedroFormContext variable so you can affect changes to the core code base as well. 6.5 Creating a Web-based Version of Pedro Pedro should be extensible enough to support new forms of deployment such as a version that works on the web. The model and view aspects of Pedro’s design were sufficiently well separated to allow Tablet Pedro to be developed with minimal work. An example of a web-based application that used Pedro libraries is the web-application generated by Pierre. Code for Pierre can be downloaded at the same Source Forge site used to host Pedro. The web application in the download relies on JSP, Java Servlets and the Struts Framework. It could be used to inform the development of a web-based version of the data entry tool. 6.6 Upgrading to Higher Versions of Java Since its first release in February 2003, all code for Pedro has been written using JDK1.4. Developers should have no problem recompiling the code base to support JDK1.5 classes. 7 Future Enhancements Pedro 2.0 is meant to be one of the last major releases of the tool. A great amount of work has been done to redesign the software so it can accommodate a greater range of plugins. The most important future enhancement will be replacing the existing MSV Schema Parser with Castor. Currently Pedro is able to interpret only a fraction of XML Schema structures. We have found that the current level of support of schema features is more than adequate for most simple use cases. However, the benefit of enhancing the schema reader is that it can entertain a wider range of legacy schemas developed independently of the tool. Future releases will also include features which have been inspired from Pierre, a project that builds on Pedro libraries. The features will appear in the Pedro Configuration Tool as plugins, and will be used to provide data modellers with more support for rapid prototyping activities. 7.1 Replacing the Schema Reader’s MSV Parser with Castor 7.1.1 Description Pedro is critically dependent on its interactions with SchemaReaderInterface, which is currently being implemented by the MSVSchemaReader. Former Pedro developer Kai Runte used Sun’s MSV schema reader to build the class. MSV is a very flexible library of functions which help interpret various kinds of XML Schemas. Kai’s work resulted in a schema reader that can understand approximately 10 of XML Schema’s 40 or more concepts. This limited support has allowed Pedro to be used in a variety of useful settings. There are a number of reasons to rewrite the schema reader: MSVSchemaReader is complicated. Although it uses a Visitor pattern to traverse a syntax tree, it makes use of a number of embedded classes and HashMap variables that make the code a bit difficult to understand MSV appears to be an aging library maintained by a single developer we know the schema reader works but currently there are no developers on the project that can easily enhance it! We decided that rather than trying to enhance the MSVSchemaReader, it was better to change schema reading technologies altogether. Chris Garwood investigated a number of technologies and found that the Castor Project provided the best alternative. It helps to hide the details of parsing schemas and it is a project that appears to have a group of people who maintain it. Castor generates Java class definitions for XML Schemas. We feel that rather than have a schema reader class parse schema syntax, it can apply Java introspection on the generated classes instead. We are reasonably confident that we could deduce all of the existing schema properties in addition to others such as references and inheritance. 7.1.2 Suggested Approach 1. Develop a suite of test cases that can be used to ensure that the new schema reader performs all the same tasks as the old one. This will probably involve testing the presence or absence of field properties in the template record definitions generated by the activity. 2. Wrap Castor so that it can be used programmatically to generate Java Class definitions 3. Introspect over the generated classes, and use property values in the same way they are described in Appendix B. 4. apply the suite of test cases. 5. when all test cases pass, substitute the schema reader. This should only involve changing the reference to MSVSchemaReader found in pedro.mda.schema.Startup to instead reference the new SchemaReader. 7.1.3 Scope of Effect Every tool made in the Pedro project relies on template record definitions produced by the schema reader. It is critical to thoroughly test properties of the templates produced by the new schema and compare them to those produced by the MSV Schema Reader. If there are no problems with how the templates are produced, the rest of the tools that use it should remain unaffected for currently supported schemas. 7.2 Auto-generate Functional Specifications 7.2.1 Description Like Pedro, Pierre has a Configuration Tool which is used to rapidly prototype functional specifications of applications. Both tools use the configuration file made by the configuration tools to generate end-user applications. However, Pierre also uses the configuration file to auto-generate functional specifications that are intended to be read by people. Developers, end-users and others can review the HTML document while they evaluate the application prototypes. To help foster iterations of development, Pierre allows designers to include end-user comments as part of the specification. These comments do not effect the behaviour of generated applications. However, they do appear in the auto-generated HTML document. A future release of Pedro will include a similar feature for auto-generating functional specifications. The generation of documentation to suit a current prototype will help in cases where people want to discuss the tool without requiring a demonstration of the software. 7.2.2 Suggested Approach 1. Add a class called “Comment” to the XML Schema used by the Pedro Configuration Tool. 2. Add list fields containing comments to various classes defined in the schema. 3. Add a new “Comment” data container class to the pedro.mda.config.* package. The comments would not appear in the generated applications but would remain as notes for the developer to use in discussions with project leaders and potential end-users. 4. Modify pedro.mda.config.PedroConfigurationReader to handle parsing the comment records that appear in different parts of the configuration file. 7.2.3 Scope of Effect This change should only affect the XML Schema for the Pedro ConfigurationTool and the PedroConfigurationReader. The Comment class won’t be used by anything until a feature is developed to generate functional specifications. 7.3 Generate “Test” Feature 7.3.1 Description The test feature will automatically generate prototypes based on the current state of the configuration file being developed in the Pedro Configuration Tool. In Pedro 2.0, data modellers usually test their work by saving the current configuration file and invoking Pedro using the “run_pedro” script included in the release. The test feature would allow them to automatically generate applications by pressing one button. 7.3.2 Suggested Approach The task could be supported by a PedroPlugin called TestPedroApplication, which is linked to the Option Menu via the Pedro Configuration Tool. The first task of the plugin would be to convert the current configuration model held in memory into instances of data container classes defined in the pedro.mda.config.* package. For example, a “record_model” record appearing in the configuration tool would have to be converted into an instance of pedro.mda.config.RecordConfiguration. During this step, a new instance of ConfigurationReader is populated with values derived from a tree of record model objects instead of those parsed from a ConfigurationFile.xml file. The next step is to launch a version of Pedro that uses the ConfigurationReader instance. The code for the plugin might look like: import import import import pedro.mda.config.*; pedro.mda.model.*; pedro.mda.schema.*; pedro.system.*; class TestPedroApplicationPlugin implements PedroPlugin { ... public execute(PedroFormContext pedroFormContext) { //get the root model of the current configuration file RecordModel currentRecordModel = (RecordModel) pedroFormContext.getProperty(PedroFormContext.CURRENT_RECORD_MODEL); RecordModelUtility recordModelUtility = new RecordModelUtility(); RecordModel configurationFileRootModel = recordModelUtility.getRootModel(currentRecordModel); //Use the converter you developed PedroConfigurationReader configurationReader = YourConversionClass.createConfigurationReader(configurationFileRootModel); //set up an execution environment which can be used to launch the test application PedroApplicationContext targetSchemaApplicationContext = (PedroApplicationContext)pedroFormContext.getApplicationProperty(PedroApplicatio nContext.TARGET_SCHEMA_APPLICATION_CONTEXT); File targetSchemaModelDirectory = (File) targetSchemaApplicationContext.getProperty(PedroApplicationContext.MODEL_DIRECTO RY); //develop some routine to ensure you get the name of the model folder String targetSchemaModelFolder = parseModelFolder(targetSchemaModelDirectory); //let Pedro’s “Startup”, “WorkspaceFileFinder” and “Workspace” classes produce //a new instance of PedroFormContext -- this will hold all the environment //variables needed to launch a test version of the application WorkspaceFileFinder testWorkSpaceFileFinder = new WorkspaceFileFinder(“.”, targetSchemaModelFolder, false); Startup testStartup = new Startup(new PedroApplicationContext() ); testStartup.start(testWorkSpaceFileFinder.getSchema(), testWorkSpaceFileFinder.getMainConfigurationURL(), testWorkSpaceFileFinder.getLibraryDirectory(), testWorkSpaceFileFinder.getDocumentDirectory(), testWorkSpaceFileFinder.getResourceDirectory(), testWorkSpaceFileFinder.getFileExtensionsToLaunchURL(), testWorkSpaceFileFinder.getSessionFile(), true, true); Workspace testWorkSpace = Workspace.createWorkSpace(testStartup); testWorkSpace.setWorkSpaceFiles(testWorkSpaceFileFinder); PedroFormContext testFormContext = testWorkSpace.getPedroFormContext(); //now set the configuration reader object: testFormContext.setApplicationProperty(PedroApplicationContext.CONFIGURATION_REA DER, configurationReader); 7.3.3 Scope of Effect The scope of development should not effect existing code. 8 Overview of Code Packages 8.1 Package “pedro.configurationTool” This package describes the code for the Pedro Configuration Tool. The tool is invoked through PedroConfigurationTool. Class names ending in “Plugin” implement the pedro.soa.plugins.PedroPlugin interface and are used to provide custom functionality for the configuration tool. Most of the other classes are used to provide dialogs for the plugins. 8.2 Package “pedro.desktopDeployment” The pedro.desktopDeployment.* package describes classes that are used to make the desktop version of the data entry tool. It is quite large but its classes can be organised into just a few categories. Classes that can be invoked as applications include PedroApplication and PedroAlerts. PedroService is intended to behave as a component that operates within an environment provided by a client application. The service form of deployment is how Pedro operates within other service platforms such as MyGrid. Classes that describe the behaviour of Pedro’s NavigationTree include: NavigationTree NavigationTreePanel NavigationTreeNode NavigationView TreeSelectionEventManager TreeNodeRenderer Perhaps the most complicated class in the package is TreeSelectionEventManager, which changes the main form to display the record which is displayed in the NavigationTree. The complexity in the code comes from having to validate the current record before jumping to the next. NavigationView is a relatively new class which was developed to allow Pedro to treat the NavigationTree in the desk top deployment in a similar way to the pedro.tabletDeployment.RecordStack navigation widget used in the Tablet deployment. Classes for rendering the main form include RecordView and all the form field classes, which are represented by classes ending with “FieldView”. Instances of field views are produced by RecordViewFactory. 8.3 Package “pedro.io” This package contains classes that manage most of the features for reading and writing data to file. The main I/O routines are PedroDataFileWriter and PedroDataFileReader, which manage a single XML file that conforms to a schema. The XMLSubmissionFileReader and XMLSubmissionFileWriter classes wrap these classes but provide little functionality of their own. Although they are used to support the “Import from XML” and “Export to Final Submission Format” features in Pedro’s file menu, they will eventually be replaced with the classes they wrap. Other classes which will be phased out are BasicPedroFileReader and BasicPedroFileWriter, which are artefacts from older releases. NativeFileFormatReader and NativeFileFormatWriter manage Pedro’s *.pdz files. classes use the PedroDataFileWriter and PedroDataFileReader classes to write each These information layer. Older versions of Pedro had a package pedro.dataImport.*, whose classes helped import data from and export data to spreadsheets. In Pedro v2.0, the package was eliminated and the files were moved into the pedro.io.* package. The following data import classes support the “Import from Spreadsheet” and “Export to Spreadsheet” features in Pedro’s File menu: ExportToSpreadSheet FlatFileReader HeaderRemovalDialog ImportDataToFieldDialog ImportFromSpreadSheet ImportRecordSelectorDialog ImportTableHeaderClicker ImportTableModel RecordImporter RecordImportMenuItem TargetRecordFieldSelectorDialog These classes used to be defined in a package called pedro.dataImport.* but have since been moved to the pedro.io.* package. 8.4 Package “pedro.mda.config” This package contains classes which manage the configuration options for the data entry tool which are not covered by the XML Schema. The most significant class, PedroConfigurationReader, reads a ConfigurationFile.xml file produced by the Pedro Configuration Tool, and uses instances of data container classes to manage the configuration options. The PedroConfigurationReader is referenced in many parts of the code base and provides a look-up service to find configuration options associated with schema concepts and other parts of the application. Most of the other classes in this package either parse parts of a ConfigurationFile.xml file or hold configuration data. SessionManager manages a file called SessionAspects.xml which holds information about the most recently used files. It also uses an instance of FileLauncher to read the file ./config/FileExtensionsToLaunch.xml. This small file associates file extensions with shell commands which launch other applications. The mappings are used when end-users press the “View” button on a URL field view. 8.5 Package “pedro.mda.model” This package describes Pedro’s native data structures. Section 4.2 provides most of the important information about this topic. Appendices A and B provide more information about what configuration properties are used to set attributes in native data structures such as RecordModel, ListFieldModel, EditFieldModel and AttributeFieldModel. 8.6 Package “pedro.mda.schema” The package contains classes which are responsible for interpreting the XML schema. Most of this activity is encapsulated by Startup, which is used in the “main” class of almost every tool that uses Pedro libraries. Startup uses an implementation of SchemaReaderInterface to parse the schema. For now, that implementation remains the MsvSchemaReader class. Most of Pedro depends on this one class for extracting schema information and using it to produce template definitions of native data structures. 8.7 Package “pedro.metaData” The package contains classes which manage Pedro’s meta data. The classes can be divided into two groups: data container classes that hold meta data information classes that support the Pedro Meta Data Editor DocumentMetaData and RecordMetaData hold meta data information about documents and records respectively. Their attributes are described by the meta data schema described in Appendix C. Most of the other classes service the Pedro Meta Data Editor. Most of Pedro’s normal File menu features don’t work in the editor because they manage information in the data layer of the *.pdz file. SaveMetaDataFile, OpenMetaDataFile, ExitMetaDataEditor and CloseMetaDataFile provide file menu features that only affect the meta data layer of a *.pdz file. 8.8 Package “pedro.soa.alerts” The Pedro Alerts system is based on the concept of an Alert, which is a collection of matching criteria associated with an intent such as an error, a warning, a request for communication or a desire to post a bulletin. Data for criteria are managed by the following classes: EditFieldCriterionModel ListFieldCriterionModel MatchingCriterion EditFieldComparator Criteria are visualised with the help of these classes: EditFieldCriterionView ListFieldCriterionView CriterionView MatchingCriteriaView The intent of an alert is described by states defined in the AlertActionType class. Many of the classes such as PedroAlertsEditor, ValidationTreePanel, and AlertNode support the UI for the Pedro Alerts Editor. Some classes such as AlertsBundle, AlertsBundleReader and AlertWriter help read and write alerts to alert bundles, which are *.zip files that contain alerts expressed as *.xml files. 8.9 Package “pedro.soa.id” This small package describes services which generate identifier values for text fields. The service is a Java class that implements the IDGeneratorService. Pedro comes with its own default service DefaultIDGeneratorService. The creation of the services is managed by the IDGeneratorServiceFactory. 8.10 Package “pedro.soa.ontology.provenance” This package manages the meta data that Pedro gathers about the ontology terms used to mark-up form fields. OntologyTermProvenance holds meta data about an ontology term. The class manages an instance of OntologyMetaData, which holds general information about the ontology such its name and version. An OntologyTermProvenance object also has an OntologyTermMetaData object that holds meta data that specifically relates to the term. The attributes of these classes correspond to parts of the meta data schema which drives the Pedro Meta Data Editor (see Appendix C). OntologyTermProvenanceManager keeps track of all the terms that are used in a session. When a user opens a file, Pedro reads the meta data file and populates the manager with terms that have already been used to tag the document. 8.11 Package “pedro.soa.ontology.sources” This package contains all the classes which support the ontology source class that is part of the Pedro Ontology Service Framework. OntologySource is the main interface developers must implement to make their own providers of ontology terms. TreeOntologySource extends the OntologySource by supporting the notion of a tree of terms. Most of the classes in this package support default implementations of TreeOntologySource. AbstractTreeOntologySource contains most of the code for searching through a tree. It also makes use of OntologyTreeCloner to return sub-ontologies that are copies of the tree that are rooted by certain terms. TabIndentedTextSource extends AbstractTreeOntologySource and reads terms from a tab-indented text file. XMLOntologySource extends the same class but uses its own particular XML-based format for expressing ontology terms. SingleColumnTextSource is an example of an ontology source that implements OntologySource but not TreeOntologySource. It reads its terms from a text file containing a single column of terms. There are a number of classes related to the representation of an ontology term. OntologyTerm is a data container class with properties that include an identifier, a label, and a collection of related terms. The class does not describe how terms are related. However, the relations are inferred by the OntologyRelationshipType parameter that is used when an OntologySource returns a collection of terms related to a given term. TreeOntologyTerm extends OntologyTerm to include a notion of a parent term. Most of the default ontology source implementations use TermIdentifierUtility to create default identifiers for terms. The package also includes a number of marker interfaces that provide rendering hints for Pedro’s pedro.soa.ontology.views.DefaultOntologyViewer class. ImageDescriptionSupport implies that most terms will have an associated image. PictureOntologySource, a sub-class of XMLOntologySource, uses this interface. The viewer uses the support of this interface to make it show terms as a collection of thumbnail images. DictionaryDescriptionSupport implies that most terms are associated with a definition. The viewer uses this information to support a tabular view of terms with columns for term and definition. URLDescriptionSupport implies that a source will have a help web page for most terms. Sources which implement this interface cause the viewer to include an HTML panel to present the web page for the currently selected ontology term. 8.12 Package “pedro.soa.ontology.views” This package contains all the classes which support the ontology viewer that is part of the Pedro Ontology Service Framework. The OntologyServiceManager is responsible for listening to right click actions end-users make when their mouse cursor hovers over the label of a form field which supports ontology services. When this happens, the manager class produces a popup menu with links to OntologyServices that have been associated with the field. If an ontology contains at most 40 terms, Pedro uses instances of OntologyTermMenuItem to render ontology terms as menu items. If the service has more than 40 terms, it delegates to an OntologyViewer. If the OntologyService has no OntologyViewer, Pedro uses its own default viewer DefaultOntologyViewer. This class interrogates OntologySource objects to determine what other interfaces they implement. It uses the presence of other interface implementations to determine which of the following default views it can use to render an ontology: ListView - which renders all pedro.soa.ontology.sources.OntologySource objects TreeView - renders all pedro.soa.ontology.sources.TreeOntologySource objects. Note that the items shown in the TreeView’s tree display are instances of OntologyTermNode. This class extends javax.swing.tree.DefaultMutableTreeNode. DictionaryView - renders sources which also implement pedro.soa.ontology.sources.DictionaryDescriptionSupport. Note that DictionaryView is a JTable which commits its data to an instance of the DictionaryTableModel. PictureView - renders sources which also implement pedro.soa.ontology.sources.ImageDescriptionSupport. The default viewer is able to interact with the views in a consistent manner because they all implement the OntologyView interface. This is an interface that was specifically developed to allow the default viewer to support multiple ways of presenting ontologies to end-users. The views register the default viewer as an OntologyViewListener. This interface is used to alert DefaultOntologyViewer when terms have been selected in one of the views. When an users have chosen their terms in some kind of OntologyViewer, the viewer notifies an OntologyTermSelectionListener. This class is responsible for recording the terms that have been used and inserting the term labels into the appropriate form field. Pedro has its own DefaultOntologyTermSelectionListener class which supports this interface. 8.13 Package “pedro.soa.plugins” This package contains all the classes that are needed for developers to extend the system with their own plugins. PedroPlugin is the main interface that developers must use if they are making their own plugins. The class implementing PedroPlugin can also implement a number of marker interfaces such as DataExportPlugin, DataImportPlugin, ValidationPlugin and AnalysisPlugin. When Pedro registers plugins, for the current form, it shows a count of import, export and analysis plugins on the status bar. At startup, Pedro uses PluginFileFilter to identify JAR files that end in the *.plugins extension. The PluginLoader examines each Java class in these jar files to determine whether they implement the PedroPlugin interface and other marker interfaces. It also determines whether a plugin can be applied to the current record type being displayed. If the currently displayed record has plugins associated with it, Pedro renders a button in the top right corner of the form. If a field is associated with plugins, the tool renders the same kind of button at the end of the form field. When end-users press the button, an instance of PluginSelectionDialog appears. The dialog presents the end-users with a list of the available plugins. 8.14 Package “pedro.soa.security” contains basic classes for supporting a security service. The service is used to determine whether a given User can access application features such as ontology services, form buttons, and menu items. For now, Pedro relies on a DummySecurityService which allows full access to any feature. In future, other implementations of the SecurityService will be used to mask document data, restrict some features and provide others as part of a system of user preferences. 8.15 Package “pedro.soa.validation” The classes in this package support Pedro’s facilities for validating the data set. There are three main interfaces that developers can implement to create their own validation services. DocumentValidationService is for services which validate the contents of an entire data set. They are triggered whenever an end-user attempts to use the “Export to Final Submission Format” button in the File Menu or the “Show Errors” button in the View Menu. RecordModelValidationService is for services which are meant to identify illegal combinations of form field values. This service is triggered whenever the end-users press “Keep” or “Done” buttons on the main record form. FieldValidationService is for services which validate the contents of a particular field. EditFieldValidationServices are used to validate a single field value, whereas a ListFieldValidationService will be used to identify errors in the number and type of child records a list field contains. Most of the classes in the package focus on default field validation services which perform type checking on field values. All of them extend AbstractEditFieldValidationService which contains code for managing the field name and for determining whether a field value is empty. The rest of the classes apply type checking to field values which are non-empty. Pedro has a separate validation service to scan for required fields which are left empty. Pedro initiates a validation activity via the ValidationFacility class, which in turn applies appropriate field, record and document level validation services. 8.16 Package “pedro.tabletDeployment” The package describes classes that are used to create the TabletPC deployment of the data entry tool. Most of the classes have the same names as others which are used in the desktop deployment. 8.17 Package “pedro.util” The following classes relate to Pedro’s context sensitive help system: ContextHelpItem ContextHelpService HelpEnabledButton HelpEnabledCheckBox HelpEnabledCheckBoxMenuItem HelpEnabledLabel HelpEnabledMenuItem HelpLinkCreator HelpLinkListener Other classes relate to file filters that are used to limit file searches to include certain types of files: XMLFileFilter XSDFileFilter ZIPFileFilter HTMLFileFilter PedroFileFilter PedroBackupFileFilter Most of the remaining classes order or display items in a list. 8.18 Package “pedro.system” The most important classes in this package are Context classes that hold a collection of environment variables. These variables refer to different parts of the Pedro Application. The context classes include: Context PedroFormContext PedroDocumentContext PedroApplicationContext PedroUIFactory is used prolifically throughout the application. It centralises the creation of all components. PedroResources is also used in many classes to provide the String values that appear in UI components. ModelSelectorDialog is the dialog that allows end-users to run Pedro with a model. 8.19 Package “pedro.soa” Contains ServiceClass,the basic service class, and GeneralServiceFactory, the basic factory class for creating services. The package also contains interfaces for making components that can edit fields. 8.20 Package “pedro.workBench” This is a simple package meant to provide a display for activating all the other Pedro tools. The main class is PedroWorkBench. UI 9 Index Appendix A: Schema for the Pedro Configuration Tool This appendix describes the XML Schema that drives the Pedro Configuration Tool. The schema defines the record structures that hold configuration data when the tool is running. Many of the schema classes correspond to classes that appear in the code base. For example, the “ontology_service” definition corresponds to pedro.soa.ontology.views.OntologyService and the “record_model” definition corresponds to pedro.mda.model.RecordModel. The file ConfigurationFile.xml which appears in the ./config directory of each model folder should validate against this schema. A.1 Configuration Options for Menu Features Classes which represent properties of Pedro menus are illustrated in Figure A-1. The following subsections describe properties of menus that appear in the menu bar of a Pedro dialog. Figure A-1: part of the Pedro configuration schema that relates to properties of menu features. A.1.1 Class: “menu_features” Property existing_menus custom_menu Description collection of menu configuration records that describe the functionality of the standard File, Edit, Option, View and Help application menus. a collection of custom_menu objects that describe the features of custom menus. A.1.2 Class: “existing_menus” Property file_menu edit_menu view_menu options_menu help_menu include_window_menu Description configuration record for the File menu; if it is absent the menu is not included in the menu bar of the generated application. configuration record for the Edit menu; if it is absent the menu is not included in the menu bar of the generated application. configuration record for the View menu; if it is absent the menu is not included in the menu bar of the generated application. configuration record for the Options menu; if it is absent the menu is not included in the menu bar of the generated application. configuration record for the Help menu; if it is absent the menu is not included in the menu bar of the generated application. Pedro has a “Windows” menu that shows a list of files that are currently open. If this configuration value is true, the menu will appear. Otherwise, the Windows menu will not appear in the generated application. A.1.3 Class: “file_menu” Property Description include the “New...” menu item include the “” menu item include the “Import from Spreadsheet” and “Export to Spreadsheet” menu items show_open_file include the “Open” menu item show_save_file include the “Save” menu item show_saveAs_file include the “Save As...” menu item show_close include the “Close” menu item show_import_records deprecated. Import records used to be a menu where I/O plugins appeared. This is now not necessary because of the way the new plugins system works. show_import_from_xml include the “Import from XML...” menu item show_export_final_submission_format include the “Export to Final Submission Format” menu item show_templates include the menu items “Load Template” and “Save Template” show_load_template include the “Load Template” menu item show_save_template include the “Save Template...” menu item show_exit include the “Exit” menu item plugin collection of plugin objects that describe customised application features. show_new_file show_favourites show_spreadsheet_options A.1.4 Class: “edit_menu” Property show_copy show_paste Description include the “Copy” menu item. This feature allows end-users to copy text from a form field or copy a sub-tree of records. include the “Paste” menu item. This feature allows end-users to paste plugin text into the current field or a sub-tree of records into the current record. collection of plugin objects that describe customised application features. A.1.5 Class: “options_menu” Property show_describe_document show_alerts plugin Description include the “” menu item include the “” menu item collection of plugin objects that describe customised application features. A.1.6 Class: “view_menu” Property show_errors show_dependencies show_changes show_search show_clear plugin Description include the “Show Errors” menu item include the “Show Dependencies” menu item include the “Show Changes” menu item include the “Show Search” menu item include the “Show Clear” menu item collection of plugin objects that describe customised application features. A.1.7 Class: “help_menu” Property show_about show_schema_information show_context_help help_document plugin Description include the “Show About” menu item include the “Show Schema Information” menu item include the “Enable Context Help” menu item a collection of help_document objects. A help_document is used to render a help menu button which is associated with a pop-up web page. collection of plugin objects that describe customised application features. A.1.8 Class: “help_document” Property label link Description The name of the menu item representing the link to a help web page a URL or a local file path for a web page. A.1.9 Class: “plugin” Property name feature_code Description name of the plugin; this is used to represent the plugin in lists and menus. a unique identifier for the feature. In future, this will be used along class_name list_order tool_tip description is_persistent with a User object to have a security service determine whether a plugin should appear in the application. a Java class that implements the interface pedro.plugins.PedroPlugin not yet implemented; the list order helps define the order in which a group of plugins is displayed. text that hovers over a user interface object which represents the plugin (eg: a menu item) a description of what the plugin does determines whether pedro.soa.plugins.PluginFactory creates multiple instances of a plugin or a single instance of a plugin through the lifetime of an application session. A.1.10 Class: “custom_menu” Property name feature_code position tool_tip help_link plugin Description name of the menu code used to uniquely identify an application feature. In future releases, this will be used by a security service to determine whether a given user can access it or not. the relative position a menu has with respect to other custom menus that appear in the menu bar. help text which appears when an end-user’ mouse cursor hovers over the menu item a web page associated with the menu. This will be shown if contextsensitive help is activated. collection of plugin objects that describe customised application features. A.2 Configuration Options for Record Structures Classes that describe configuration properties for Pedro’s native data structures are shown in Figure A-2: Figure A-2: part of the Pedro configuration schema that describes data structures The following tables describe the properties of the major classes in the class diagram. A.2.1 Class: “schema_concept_field” Note that this class doesn’t actually appear in the Pedro configuration schema. It is included to make diagramming easier. The following properties represent attributes in the native data structure pedro.mda.model.DataFieldModel. Property name ontology_identifier tool_tip help_link form_comments plugin Description name of the schema concept an identifier which represents an XML Schema form concept as an ontology identifier. Pedro’s Ontology Service Framework allows ontology services to ask Pedro questions about what concepts appear in the current form. This information can include what field called the service, what other fields exist, the kind of record currently being displayed and other ontology terms that have been used to mark up other form fields. The identifier makes XML schema concepts more compatible with formalisms used by some kinds of ontology technologies. help text which appears over the User Interface object that represents a schema concept. This can be the title of the record form or the labels of form fields. a URL for a web page that describes the schema concept for the endusers. comments associated with a schema concept which appear on the form. For a record, the form comments will appear immediately under the record title on the main form. Comments for fields will appear immediately above them. a collection of plugins associated with the concept. If plugins are present for a record, Pedro will render a “Plugins...” button in the top right corner of the main form. Field-level plugins are represented by the same button, which appears at the end of a form field. A.2.1 Class “record” The “record” class in the schema describes configuration properties that are used to set attributes of pedro.mda.model.RecordModel. Property Description attribute_field a collection of fields which have been identified as identifier fields in the XML Schema. Such fields will make use of “ID” and “IDREF” properties. edit_field a collection of edit fields that each support a single scalar value. list_field a collection of list fields, each of which may hold one or more records of one or more types. record_validation_service a validation service designed to validate combination of form field values. A.2.2 Class “list_field” The “list_field” class describes configuration properties that are used to set attributes of pedro.mda.model.ListFieldModel Property Description list_field_editing_service a collection of list_field_editing_service objects. Each object describes an editing component that is invoked whenever a user presses the “New” or “Edit” buttons to make a new record of a particular child record type. A.2.3 Class “edit_field” The “edit_field” class describes configuration properties that are used to set attributes of pedro.mda.model.EditFieldModel. Property Description default_value the value that will be used to populate a new record when it is first created and displayed in the main form. units units associated with the field value (eg: cm, hrs, km/h...) allow_free_text determines whether a text field accepts text or not. This is used with ontology services to make form fields that accept free text, only ontology terms selected from a service, or a combination of the two. is_scrolling_text_field determines whether a text field should be rendered with a single text line or in a text area enclosed within a scroll pane. is_display_name_component whether an edit field value is used as part of the display name that advertises the parent record model in lists. field_validation_service a collection of field_validation_service objects, each of which describes a service which performs an error check on the field value. ontology_service a collection of ontology_service objects, each of which describes an ontology service that marks-up form fields with ontology terms. A.2.4 Class “attribute_field” The “edit_field” class describes configuration properties that are used to set attributes of pedro.mda.model.EditFieldModel. Property id_generator_service Description an id_generator_service object that describes the service used to provide identifier values for an attribute field. A.3 Configuration Options for Service Classes Figure A.3-1: the part of the Pedro configuration schema that describes services Figure A.3-1 describes the classes defined in the XML Schema which hold configuration options about Pedro’s native data structures. Note that service_class doesn’t actually exist in the configuration schema but is included here for ease of diagramming the classes. The following tables describe the schema properties in detail. Most of the classes inherit all their attributes from service_class and do not have their own tables of attribute descriptions. A.3.1 Class “service_class” Properties from this synthetic schema class correspond to properties found in pedro.soa.ServiceClass. Property Description class_name the fully qualified path of a Java class. parameter a collection of parameter objects, which are name-value pairs. Parameters are used to initialise the service. is_persistent if the value is false then service factories will produce a new instance of the service. If the value is true the factories will return a single managed instance of a service. A.3.2 Class “ontology_service” This schema class holds configuration options that are used to set pedro.soa.ontology.views.OntologyService. Property Description name name of the ontology service. This name will be displayed in lists of services for the end-user. feature_code a unique identifier for the ontology service. This will be used later in projects that are trying to use the same services to perform semantic searches in data repositories. description description of the ontology the service provides. ontology_source the component that provides ontology terms; this is optional but one of source or viewer must be present in an ontology service ontology_viewer the component that views ontology terms; this is optional but one of source or viewer must be present in an ontology service A.3.3 Class “list_field_editing_service” Pedro supports the use of components for editing edit and list fields. When these are specified, the components are invoked for editing rather than another pedro form. An edit field can have exactly one kind of editing service. However, in a multiple-type list, there could be a list field editing service for each supported child record type. This 1:N relationship is why list_field_editing_service has its own class but edit_field_editing_service does not. Property Description name of the ontology service. This name will be displayed in lists of services for the end-user. editing_component_class_name fully qualified path of the Java class that implements the editing service record_class_name A.3.4 Interfaces Implemented by Service Classes and Plugins A number of schema classes used to desribe services have a field that that holds a class name which implements some kind of interface. The following table indicates what interface should be implemented by a Java class associated with services: Service Type described in the XML Schema ontology_source ontology_viewer field_validation_service record_validation_service document_validation_service Interface expected to be implemented by service class pedro.soa.ontology.sources.OntologySource or pedro.soa.ontology.sources.TreeOntologySource pedro.soa.ontology.views.OntologyViewer pedro.soa.validation.FieldValidationService or pedro.soa.validation.EditFieldValidationService or pedro.soa.validation.ListFieldValidationService pedro.soa.validation.RecordModelValidationService pedro.soa.validation.DocumentValidationService plugin id_generator_service list_field_editing_service editing_component_class_name pedro.soa.plugins.PedroPlugin pedro.soa.id.IDGeneratorService pedro.soa.ListFieldEditingComponent pedro.soa.EditFieldEditingComponent Appendix B: Mapping XML Schema Attributes to Application Properties The following table shows how code fragments of an XML Schema influence the creation of template records and the generation of forms. XML Schema Construct Effect on Form Generation <xs:element name=”Organism”> this structure represents a record form called <xs:complexType> “Organism”. Information about the form is used <xs:sequence> to create a template of a RecordModel object. ... “Organism” would be used to set the (field definitions) ... record_class_name attribute. </xs:sequence> </xs:complexType> </xs:element> <xs:element name="species_name" .../> <xs:element ref="Organism"/> or <xs:group ref="ProcessingStep" .../> ... <xs:group name="ProcessingStep"> <xs:choice> <xs:element ref="FreezeSample" minOccurs="0"/> </xs:choice> </xs:group> ... <... minOccurs="0"/> or <... minOccurs="1"/> <... maxOccurs=”1”/> or <... maxOccurs=”unbounded”/> represents an edit form field that holds one value. Information about the field is used to create a template of an EditFieldModel. “species_name” would be used to set the name attribute. represents a list field. “ref” indicates the field should be rendered as a list. The <xs:element..> example indicates the list will support only records of type “Organism”. The <xs:group...> example indicates the list will support multiple types of records defined in an <xs:group...> declaration. In the example, one of the record types supported in the list would be “FreezeSample”. In the top example, “Organism” would be used to set the childTypes attribute of ListFieldModel. In the bottom example, an array of Strings including “FreezeSample” would be used to set the same childTypes attribute in ListFieldModel. The value for childTypes is used to determine the rendering hint expressed in the fieldViewType of DataFieldModel. The hint describes whether a list should be rendered to support one or multiple types of children. determines whether a form field is optional or required. A minOccurs value of “0” means the field is optional. A value of “1” indicates the field is required. All other values are ignored by the schema reader. The minOccurs value is used to set the “isRequired” field of DataFieldModel. determines whether a list field holds one item or multiple items. The maxOccurs value is used to <...type=”xs:string”.../> or <...type=”xs:integer”.../> or <...type=”xs:float”.../> or <...type=”xs:decimal”.../> or <...type=”xs:double”.../> or <...type=”xs:positiveInteger”.../> or <...type=”xs:date”.../> <...type=”xs:boolean”.../> <...type=”xs:anyURI”.../> <xs:element name="organism_type"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration <xs:enumeration <xs:enumeration <xs:enumeration <xs:enumeration <xs:enumeration organism"/> value="mammal"/> value="bird"/> value="amphibian"/> value="reptile"/> value="fish"/> value="micro- </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="sample_code"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z]*"/> </xs:restriction> </xs:simpleType> </xs:element> set the fieldViewType of DataFieldModel with a rendering hint for multiple-item or singleitem lists. indicates which type-based field validation service will be associated with an edit field. causes Pedro to render the field as a boolean form field with two radio buttons for “true” and “false”. This declaration is used to set “true” and “false” values for the choices attribute of GroupFieldModel. indicates that an edit field that can take either a URL or a file path. The field is rendered as a text field that is accompanied by a “Browse” button which allows end-users to search for a file. an <xs:restriction...> tag with enumerations is rendered as a combination field. If there are three or less enumerations, the field is rendered as a collection of radio fields. If there are more than three enumerations, the field is rendered as a drop-down list of choices. The enumerations are used to set the choices attribute of GroupFieldModel. an <xs:restriction...> tag that contain a <xs:pattern../> tag will cause Pedro to render a text field which is associated with a validation service that checks whether the value complies with a regular expression. Appendix C: Schema for the Pedro Meta Data Editor The Pedro Meta Data Editor is driven off the schema described in Figure C-1: Figure C-1: the schema describing meta data managed by Pedro The pedro_meta_data class holds the summary data about a document. It will have instances of record_meta_data for each record type that appears in the data layer. A record_meta_data object will hold the number of times that a record type appears in the document, and it will have a collection of field_meta_data objects. The field_meta_data objects hold a collection of ontology terms that are used to mark-up form fields. The ontology_term class holds all the provenance data about a term. Most of the fields are borrowed from the SKOS standard that describes ontology meta data. Fields prefixed by “ontology_service” describe the software service, not the terms themselves. This might be important for software agents which need to know about the service providing the terms. Items such as super_class and characteristic are string identifiers, but they have been expressed in their own separate classes. This is because Pedro doesn’t have a mechanism for displaying arrays of simple types. To support lists, these fields have to be expressed as separate classes. Other fields that appear at most once are also expressed in their own classes, but at some point they may be recast as a single value edit field within the ontology term class. The following tables describe the class properties in detail. C.1 Class: “pedro_meta_data” This class represents the top level form that appears in the Pedro Meta Data Editor. Most of its fields describe general information about the document. Property title author institution document_description record_meta_data Description title of the document; provided by the end-user through the “Describe this document” feature in the Options menu of the Pedro editor. author that produced the document; provided by the end-user through the “Describe this document” feature in the Options menu of the Pedro editor. institution that produced the document; provided by the end-user through the “Describe this document” feature in the Options menu of the Pedro editor. a summary that describes the nature of the document; provided by the end-user through the “Describe this document” feature in the Options menu of the Pedro editor. collection of record_meta_data objects that retain information about record types which appear in the document C.2 Class: “record_meta_data” This class describes meta data for a record type defined in the target schema. Property Description name a record type defined in the target schema frequency the number of times the record type appears in the data layer of the document field_meta_data collection of field_meta_data objects that hold meta data about the edit fields which are defined in the target schema for the given record_type C.3 Class: “field_meta_data” This class describes meta data for a field defined in the target schema Property Description name the name of an edit field defined in the target schema. The edit field belongs to the record type described in the “name” field of the record_meta_data class. ontology_term a collection of ontology term instances that record meta data about terms used to mark-up form fields. C.4 Class: “ontology_term” The ontology_term class holds provenance data about ontology terms used to mark-up form fields. Most of the fields are described in the SKOS standard which describes aspects of ontology meta data. Most of the field values will be provided automatically by ontology services. However, they will remain editable in the Pedro Meta Data Editor to allow data curators to update information about terms. Property ontology_service_code ontology_service_name ontology_service_description ontology_service_version ontology_service_formalism ontology_service_email type identifier label definition comment example status version_information image issued modified deprecated super_class super_property domain range characteristic inverse_of replaces replaced_by Description a code that uniquely identifies the ontology service which was used to mark up the form field. name of the ontology service description of what the ontology service does. version of the software used to make the ontology service different ways of expressing an ontology. Examples include DagEdit and OWL. the e-mail of a contact person associated with the development of the ontology service; for example the programmer who maintains the code for the service the name of an image file that represents the term, eg: an anatomy diagram or a diagram describing the structure of a chemical compound. Appendix D: Summary of Design Decisions and Historical Influences for the Pedro Project Historical Influence 1: The community of potential end-users wanted software that could produce data sets which complied with a formally defined domain model. Historical Influence 2: the software project was partly funded by the ESNW, an organisation whose remit was service provision, not research. Historical Influence 3: a year of requirements gathering had been done prior to the initial development of the software tool. Historical Influence 4: Pedro would be a tool that would be maintained by domain scientists who were not trained software engineers. The tool would have to accommodate frequent changes made to the underlying data model. Historical Influence 5: To make the tool easy to maintain, it was designed using a modeldriven approach. Historical Influence 6: The model-independent nature of the tool encouraged other domains to use it. Their feedback helped identify bugs, and led to new features which helped to service the user community the tool was initially commissioned to support. Historical Influence 7: Pedro’s ability to support other models was greatly improved by the work of another developer who was not funded by Manchester proteomics group. The collaboration made the software code base more appealing for open-source project work. Historical Influence 8: the remit of the body funding the software development was broad enough to allow the tool to be applied and modified to suit multiple domains. Historical Influence 9: the proteomics standards took so long to develop that the software team began to focus on testing the tool on domains which had simpler or more mature data models. Historical Influence 10: another software engineer was brought in to make a testing plan, rewrite training materials and interact with end-users. His detachment from the code base gave him objectivity in evaluating how well the tool worked for users. It helped eliminate biases main programmers would exhibit in justifying their work to end-users. Historical Influence 11: The Pierre Project was built using the Pedro code base. This helped improve the robustness and extensibility of core Pedro libraries. Historical Influence 12: A lab scientist guided the development of Tablet Pedro, which could be deployed on a Tablet PC. The development has shown that Pedro can be used in a laboratory, and it promises to attract the interest of other domain groups who gather data in remote areas. It also shows the program can be adapted to generate forms for alternate forms of display. Design Assumption 1: the underlying data model will change and all model concepts are equally likely to change. Design Assumption 2: the application would continue to be serviced by scarce developer resources. These people would likely be skilled domain experts but not trained software engineers. Design Decision 1: Pedro will be used developed using a model-driven approach. Task Constraint 1: Pedro will be designed to support data capture tasks. Although it could have plugins that support other activities, its core architecture will not be designed to suit other tasks. Other activities such as data dissemination, analysis and the provision of security services will be dealt with in separate projects. Front End Assumption: people using data capture tools will value usability more than accessibility. Web Technology Assessment 1: Web applications developed to promote widespread access to data should not rely on special technologies for rendering forms. They should use plain HTML forms that can be rendered by all browser client programs. Web Technology Assessment 2: The Jakarta Struts project was the best web technology evaluated to render Pedro as a web application. Front End Decision 1: Pedro will be developed as a standalone GUI application rather than as a web application. Back End Decision: Pedro will store data sets as XML-based documents. Through plugins, it can support committing data in other ways but the tool will not require the presence of a data repository. Language Decision 1: the MDA Design Tool and applications generated from it will be written using Java. Generation Decision 1: Application models will be used to generate forms at run-time rather than rely on code-generation facilities. Generation Decision 2: the MDA Design tool is a data entry application that uses a model describing configuration properties. The tool will be generated in the same manner as the other applications it helps create. POSF Decision 1: The data entry schema and the mark-up services will evolve autonomously at different rates. Therefore, decouple these things and support them through separate mechanisms. POSF Decision 2: The framework should be able to associate multiple mark-up services with the same form field. POSF Decision 3: The framework should support simple stub ontologies that can be used during rapid prototyping activities. POSF Decision 4: Base ontology services on ontology identifiers, not word phrases. Each ontology identifier will be associated with a word phrase, and optionally a definition, a URL that may describe a help web page, or an image. POSF Decision 5: Support multiple formalisms. Do not limit support either for very simple or very sophisticated ontologies. POSF Decision 6: Let each ontology service comprise one or both an OntologySource and an OntologyViewer. Each of these objects is described by an interface. An OntologySource provides terms and is designed on behalf of those who maintain ontologies. An OntologyViewer renders terms provided by the OntologySource, and is designed on behalf of those who use ontologies. The ontology service may be configured to mix and match an OntologySource with an OntologyViewer. POSF Decision 7: make the design of an OntologySource consider whether terms are maintained locally or remotely. POSF Decision 8: the framework should provide some way of determining whether an OntologySource needs to be updated. End-users should be able to decide whether the ontology service updates itself. POSF Decision 9: require ontology services to provide meta data information about the ontologies. This information should include the name, author, version, description and kind of formalism supported by an ontology.