PedroDesignDocument

advertisement
Pedro 2.0 Design Document
written by Kevin Garwood
edited and formatted by Chris Garwood
Table of Contents
1 Introduction ....................................................................................................................................... 7
2 The Pedro Project Vision .................................................................................................................. 8
2.1 The Tools ................................................................................................................................... 8
2.1.1 Pedro ................................................................................................................................... 8
2.1.2 Pierre ................................................................................................................................... 9
2.2 Shared Aspects of the Tools..................................................................................................... 13
2.3 Using the Tools in Concert ...................................................................................................... 15
3 Project Environment ....................................................................................................................... 16
3.1 Project History ......................................................................................................................... 16
3.2 Design Forces ........................................................................................................................... 20
3.3.1 Towards a Model Driven Approach.................................................................................. 20
3.3.2.1 Constraining the Task ................................................................................................ 22
3.3.2.2 Constraining the Front End ........................................................................................ 23
3.3.2.2 Constraining the Back End ........................................................................................ 25
3.3.2.2 Constraints on Programming Languages and Generation of Applications ................ 26
4 Pedro Architecture .......................................................................................................................... 27
4.1 Model-Driven Aspects ............................................................................................................. 28
4.2 Service Based Aspects ............................................................................................................. 29
4.2.1 Service Anatomy ............................................................................................................... 29
4.2.1.1 Task ............................................................................................................................ 29
4.2.1.2 Scope of Effect ........................................................................................................... 29
4.2.1.3 Persistence .................................................................................................................. 29
4.2.2 Access to Application Variables ....................................................................................... 29
4.2.3 Service Types .................................................................................................................... 30
4.2.3.1 General Services ........................................................................................................ 30
4.2.3.2 Specialised Services ................................................................................................... 30
4.2.3.2.1 Validation Services ............................................................................................. 30
4.2.3.2.2 Ontology Services ............................................................................................... 30
4.2.3.2.3 ID Generator Service .......................................................................................... 31
5 Description of Subsystems .............................................................................................................. 32
5.1 Schema Reader ......................................................................................................................... 32
5.1.1 Purpose .............................................................................................................................. 32
5.1.2 Description ........................................................................................................................ 32
5.1.3 Design History .................................................................................................................. 32
5.1.3 Scope of Effect .................................................................................................................. 34
5.1.4 Relevant Code Packages ................................................................................................... 34
5.2 Native Data Structures ............................................................................................................. 35
5.2.1 Description ........................................................................................................................ 35
5.2.1.1 RecordModel Properties ............................................................................................ 37
5.2.1.2 DataFieldModel Properties ........................................................................................ 37
5.2.1.3 EditFieldModel Properties ......................................................................................... 39
5.2.1.4 IDFieldModel Properties............................................................................................ 40
5.2.1.5 GroupFieldModel Properties...................................................................................... 40
5.2.1.6 BooleanFieldModel Properties .................................................................................. 40
5.2.1.7 ListFieldModel Properties.......................................................................................... 41
5.2.2 Design History .................................................................................................................. 41
5.2.3 Scope of Effect .................................................................................................................. 42
5.2.4 Relevant Code Packages ................................................................................................... 42
5.3 Pedro Contexts ......................................................................................................................... 43
5.3.1 Purpose .............................................................................................................................. 43
5.3.2 Description ........................................................................................................................ 43
5.3.3 Design History .................................................................................................................. 44
5.3.4 Scope of Effect .................................................................................................................. 44
5.3.5 Relevant Code Packages ................................................................................................... 44
5.4 Validation Services .................................................................................................................. 45
5.4.1 Purpose .............................................................................................................................. 45
5.4.2 Description ........................................................................................................................ 45
5.4.3 Design History .................................................................................................................. 48
5.4.4 Scope of Effect .................................................................................................................. 48
5.4.5 Relevant Code Packages ................................................................................................... 48
5.5 Ontology Services .................................................................................................................... 49
5.5.1 Purpose .............................................................................................................................. 49
5.5.2 Description ........................................................................................................................ 49
5.5.2.1 Basic Data Structure: Ontology Term ........................................................................ 49
5.5.2.2 Ontology Provenance ................................................................................................. 49
5.5.2.3 Ontology Services ...................................................................................................... 50
5.5.2.4 Ontology Source ........................................................................................................ 51
5.5.2.5 Ontology Viewer ........................................................................................................ 53
5.5.2.6 Default Viewer’s Use of Introspection on Ontology Sources.................................... 54
5.5.2.7 The OntologyContext Object ..................................................................................... 55
5.5.2.8 A Walkthrough for Selecting an Ontology Term....................................................... 55
5.5.3 Design History .................................................................................................................. 56
5.5.3.1 Decoupling Controlled Vocabularies from Data Models .......................................... 56
5.5.3.2 Support for Stub Ontologies for Rapid Prototyping .................................................. 57
5.5.3.3 Basing the Framework on Identifiers Instead of Word Phrases................................. 57
5.5.3.4 Supporting Multiple Formalisms ............................................................................... 58
5.5.3.4 Decoupling Aspects of Model and View in an Ontology Service ............................. 59
5.5.3.5 Consider Local and Remote Ontology Sources ......................................................... 59
5.5.3.6 Accommodate Updating in Ontology Sources........................................................... 60
5.5.3.7 Provide Meta Data about Ontology Services ............................................................. 60
5.5.4 Scope of Effect .................................................................................................................. 60
5.5.5 Relevant Code Packages ................................................................................................... 60
5.6 ID Generator Services .............................................................................................................. 62
5.6.1 Purpose .............................................................................................................................. 62
5.6.2 Description ........................................................................................................................ 62
5.6.3 Design History .................................................................................................................. 62
5.6.4 Scope of Effect .................................................................................................................. 62
5.6.5 Relevant Code Packages ................................................................................................... 63
5.7 Plugins ...................................................................................................................................... 64
5.7.1 Purpose .............................................................................................................................. 64
5.7.2 Description ........................................................................................................................ 64
5.7.3 Design History .................................................................................................................. 65
5.7.4 Scope of Effect .................................................................................................................. 65
5.7.5 Relevant Code Packages ................................................................................................... 65
5.8 Configuration System .............................................................................................................. 66
5.8.1 Purpose .............................................................................................................................. 66
5.8.2 Description ........................................................................................................................ 66
5.8.2.1 Pedro Configuration Tool .......................................................................................... 66
5.8.2.2 ConfigurationReader .................................................................................................. 67
5.8.2.3 Other Configuration Files .......................................................................................... 67
5.8.3 Design History .................................................................................................................. 67
5.8.4 Scope of Effect .................................................................................................................. 68
5.8.5 Relevant Code Packages ................................................................................................... 68
5.9 IO ............................................................................................................................................. 69
5.9.1 Purpose .............................................................................................................................. 69
5.9.2 Description ........................................................................................................................ 69
5.9.3 Design History .................................................................................................................. 69
5.9.3.1 Use of Layers ............................................................................................................. 69
5.9.3.2 Changing Parsers........................................................................................................ 70
5.9.3.3 Support for Streams ................................................................................................... 70
5.9.3.4 Creating the “Export to Final Submission” Feature................................................... 70
5.9.3.5 Providing Support for the Meta Data Layer............................................................... 71
5.9.3.6 Merging dataImport and IO Class Packages .............................................................. 71
5.9.4 Scope of Effect .................................................................................................................. 71
5.9.5 Relevant Code Packages ................................................................................................... 71
5.10 Alerts ...................................................................................................................................... 71
5.10.1 Purpose ............................................................................................................................ 71
5.10.2 Description ...................................................................................................................... 72
5.10.3 Design History ................................................................................................................ 72
5.10.4 Scope of Effect ................................................................................................................ 72
5.10.5 Relevant Code Packages ................................................................................................. 73
5.11 Meta Data System .................................................................................................................. 74
5.11.1 Purpose ............................................................................................................................ 74
5.11.2 Description ...................................................................................................................... 74
5.11.2.1 Walkthrough for Capturing Ontology Term Meta Data .......................................... 75
5.11.2.2 The Pedro Meta Data Editor .................................................................................... 75
5.11.3 Design History ................................................................................................................ 76
5.11.4 Scope of Effect ................................................................................................................ 76
5.11.5 Relevant Packages........................................................................................................... 76
5.12 Form Generation Facilities..................................................................................................... 77
5.12.1 Purpose ............................................................................................................................ 77
5.12.2 Description ...................................................................................................................... 77
5.12.2.1 General Classes for Generating Desktop Pedro Forms ............................................ 77
5.12.2.2 Classes for Generating Edit Fields in Desktop Pedro Forms ................................... 78
5.12.2.3 Classes for Generating List Fields in Desktop Pedro Forms ................................... 79
5.12.2.3 Classes for Generating Forms in Tablet Pedro ........................................................ 80
5.12.3 Design History ................................................................................................................ 81
5.12.4 Scope of Effect ................................................................................................................ 81
5.12.5 Relevant Code Packages ................................................................................................. 81
6 Extending the Core Code Base ....................................................................................................... 82
6.1 Replacing the schema parser .................................................................................................... 82
6.2 Adding an extra data layer ....................................................................................................... 82
6.3 Creating a new field view ........................................................................................................ 83
6.4 Adding Form Properties ........................................................................................................... 83
6.5 Creating a Web-based Version of Pedro .................................................................................. 83
6.6 Upgrading to Higher Versions of Java ..................................................................................... 84
7 Future Enhancements ...................................................................................................................... 85
7.1 Replacing the Schema Reader’s MSV Parser with Castor ...................................................... 85
7.1.1 Description ........................................................................................................................ 85
7.1.2 Suggested Approach ......................................................................................................... 85
7.1.3 Scope of Effect .................................................................................................................. 86
7.2 Auto-generate Functional Specifications ................................................................................. 86
7.2.1 Description ........................................................................................................................ 86
7.2.2 Suggested Approach ......................................................................................................... 86
7.2.3 Scope of Effect .................................................................................................................. 86
7.3 Generate “Test” Feature ........................................................................................................... 87
7.3.1 Description ........................................................................................................................ 87
7.3.2 Suggested Approach ......................................................................................................... 87
7.3.3 Scope of Effect .................................................................................................................. 88
8 Overview of Code Packages ........................................................................................................... 89
8.1 Package “pedro.configurationTool”......................................................................................... 89
8.2 Package “pedro.desktopDeployment” ..................................................................................... 89
8.3 Package “pedro.io”................................................................................................................... 89
8.4 Package “pedro.mda.config”.................................................................................................... 90
8.5 Package “pedro.mda.model” .................................................................................................... 90
8.6 Package “pedro.mda.schema” .................................................................................................. 90
8.7 Package “pedro.metaData” ...................................................................................................... 91
8.8 Package “pedro.soa.alerts” ....................................................................................................... 91
8.9 Package “pedro.soa.id” ............................................................................................................ 91
8.10 Package “pedro.soa.ontology.provenance” ............................................................................ 92
8.11 Package “pedro.soa.ontology.sources” .................................................................................. 92
8.12 Package “pedro.soa.ontology.views” ..................................................................................... 93
8.13 Package “pedro.soa.plugins”.................................................................................................. 93
8.14 Package “pedro.soa.security” ................................................................................................. 94
8.15 Package “pedro.soa.validation” ............................................................................................. 94
8.16 Package “pedro.tabletDeployment” ....................................................................................... 94
8.17 Package “pedro.util” .............................................................................................................. 94
8.18 Package “pedro.system”......................................................................................................... 95
8.19 Package “pedro.soa” .............................................................................................................. 95
8.20 Package “pedro.workBench” ................................................................................................. 95
9 Index................................................................................................................................................ 96
Appendix A: Schema for the Pedro Configuration Tool ................................................................... 97
A.1 Configuration Options for Menu Features .............................................................................. 97
A.1.1 Class: “menu_features” ........................................................................................................ 98
A.1.2 Class: “existing_menus” ...................................................................................................... 99
A.1.3 Class: “file_menu” ............................................................................................................... 99
A.1.4 Class: “edit_menu”............................................................................................................... 99
A.1.5 Class: “options_menu” ....................................................................................................... 100
A.1.6 Class: “view_menu” ........................................................................................................... 100
A.1.7 Class: “help_menu” ............................................................................................................ 100
A.1.8 Class: “help_document” ..................................................................................................... 100
A.1.9 Class: “plugin” ................................................................................................................... 100
A.1.10 Class: “custom_menu” ..................................................................................................... 101
A.2 Configuration Options for Record Structures ....................................................................... 102
A.2.1 Class: “schema_concept_field” ...................................................................................... 103
A.2.1 Class “record” ................................................................................................................ 103
A.2.2 Class “list_field” ............................................................................................................ 104
A.2.3 Class “edit_field” ........................................................................................................... 104
A.2.4 Class “attribute_field” .................................................................................................... 104
A.3 Configuration Options for Service Classes ........................................................................... 105
A.3.1 Class “service_class” ......................................................................................................... 105
A.3.2 Class “ontology_service” ................................................................................................... 106
A.3.3 Class “list_field_editing_service” ...................................................................................... 106
A.3.4 Interfaces Implemented by Service Classes and Plugins ................................................... 106
Appendix B: Mapping XML Schema Attributes to Application Properties .................................... 108
Appendix C: Schema for the Pedro Meta Data Editor ..................................................................... 110
C.1 Class: “pedro_meta_data” .................................................................................................... 111
C.2 Class: “record_meta_data” .................................................................................................... 111
C.3 Class: “field_meta_data” ....................................................................................................... 111
C.4 Class: “ontology_term” ......................................................................................................... 111
Appendix D: Summary of Design Decisions and Historical Influences for the Pedro Project ....... 113
1 Introduction
This document explains how the Pedro code base works. It is a critical part of the vision for this
project that developers have the freedom to download and modify the core architecture to suit their
own needs. We feel that the tools will not reach a broad audience unless developers are confident
that they can re-brand the product to suit their own use cases. Our hope is that if the suite of tools is
explained well enough, it will stimulate the interest of best of breed developers to help us provide a
group of free, open-source software tools that can manage complex data.
The developer manual emphases how parts of the core application work; it is not a tutorial for how
to write plugins. For more information on how to write code modules for Pedro, please consult the
document “Developer Tutorial”.
The document was written in a serial manner so it could be printed out easily. We’ve also
converted the same manual into a collection of HTML files. The manual begins with a description
of the Pedro Project in general. Although the focus of the discussion is on Pedro, it is important to
understand how so much of the code base is re-used in related applications that support other kinds
of activities. It also provides a road-map for future development which should help projects
evaluate whether the tools are appropriate for their needs.
The discussion about the vision for the project is followed by a section which describes the project
history. In the world of academic research software, Pedro is rare in that most of the enhancements,
suggestions and bug reports have come from groups other than the original group of molecular
biologists it was funded to support. Its generic approach for generating forms has allowed multiple
independent domains to benefit from sharing the costs of testing the software. Finally, the
involvement of other developers has tested the architecture. The design history is interspersed with
a collection of key historical influences and design decisions which are summarised at the end of
the document. These highlights may help inform the design of other products.
Section 4 describes a high level view of the architecture. It is intended to give developers an
understanding of the general role of system components and the ways they work together. Section
5 describes how the core code base could be extended to support additional features.
Section 6 describes summaries of individual code packages. It is intended to help guide developers
as they navigate through the hundreds of classes that are part of the source code. For more
information about individual classes, please consult the Java Docs that come with the download.
They are not completely filled in, but they do include a summary of what each class does.
The design document comes with four appendices which cover the following themes:
 the XML schema that drives the Pedro Configuration Tool
 description of how fragments of XML schemas are used to configure native pedro data
structures
 the XML schema that drives the Pedro Meta Data Editor
 a summary of design decisions that have defined the architecture
The design document was written by Kevin Garwood and edited and formatted by Chris Garwood.
We both welcome your suggestions.
2 The Pedro Project Vision
The Pedro Project is a collection of model-driven software tools which are intended to support
simple data management tasks. The project was designed as a tool suite with three components:
Pedro and Pierre. Pedro is a model-driven system for creating data entry applications. It was first
released in February 2003 and continues to be maintained. Pierre is another model-driven system
for generating front ends which can search and retrieve data from a data repository. The tool was
first released in April 2006.
The Pedro Project Vision is to use a model-driven approach for creating a suite of software tools
that can provide basic facilities for managing complex data sets. Originally developed to support
the activities of cash-poor molecular biology labs, it has shown promise in areas of clinical
informatics, epidemiology and grid computing.
One of the main driving forces for the development team is to make the Pedro tools able to support
data management needs in the developing world. The software is free, open-source, cross-platform
and is intended to be adapted by a variety of local communities using a minimum of skilled
software developer resources.
This section introduces each of the tools, describes their common characteristics, and shows how
they could be used together to maintain a data repository.
2.1 The Tools
2.1.1 Pedro
Pedro is a system which generates data entry forms to suit concepts defined in an XML Schema.
End-users enter data through the forms to produce XML-based data sets which will validate against
the schema. Figure 2-1 shows an example of a generated Pedro application:
Figure 2-1: An example Pedro Application
The tool promotes high data quality through features which support guided data entry and validation
services. One of its most sophisticated features is its support for marking up form fields with terms
that come from one or more ontologies. The appearance and functionality of generated forms can
be customised via the Pedro Configuration Tool (Figure 2-2).
Figure 2-2: The Pedro Configuration Tool
Using the model-driven approach, Pedro uses data models to generate the generic functionality and
relies on a family of plugins to support domain-specific functionality. Validation services and
general purpose plugins can be developed which have effects at the document, record and field
levels of data entry. Other mark-up services can be developed for field-level entry. The services
can be linked with the parts of the application via the Pedro Configuration tool.
The tool assumes separate roles for Data Modeller and Programmer. The following assumptions
made about Data Modellers include:
 they are domain experts
 they are not expected to know how to write software
 they use the the Pedro Configuration Tool to generate test applications for an end-user
community.
Assumptions made about Programmers are:
 they know how to write programs
 they are not expected to be domain experts
 they write domain-specific plugins which customise Pedro for a given use case
2.1.2 Pierre
Pierre is a model-driven system for generating applications that search and retrieve information
from a data repository. The system can generate multiple front-ends which can interact with an
abstract repository that is implemented using technologies such as relational, XML or objectoriented databases.
The system assumes separate roles for Repository Designer and Service Designer. The following
assumptions are made about Repository Designers:
 they are programmers
 they are not expected to be domain experts

they write queries and reports that work with a given data repository
Service designers have these characteristics:
 they are domain experts
 they are not expected to be programmers
 they rapidly prototype a specification for a service by generating test applications for endusers.
These roles are essentially the same as those developed in Pierre, except the roles for Pierre are
more specific. Service designers are not so much designing a single application but a service which
can be used to generate multiple kinds of search applications at once. Repository designers are
programmers who spend most of their effort writing database queries that satisfy the needs of the
user community.
Building a service begins when the Repository Designers and Service Designers agree on an XML
schema which describes the concepts a data repository can publish to the outside world. It forms a
kind of broad contract that allows them to work independently of one another. While the
Repository Designers construct or modify their database, the Service Designers can begin using the
Pierre Configuration Tool (Figure 2-3).
Figure 2-3: The Pierre Configuration Tool for Pierre v2.0 (current release is Pierre 1.0a)
The Service Designers use the tool to create a specification for a search and retrieve service. Most
of the queries are defined in terms of concepts that have been defined in an XML schema. The
designers can use the tool to generate a prototype application and use this to elicit feedback from
end-users (Figure 2-4).
Figure 2-4: A test application generated by Pierre
During this rapid prototyping phase the test application is linked to a dummy data repository which
returns junk data results. This is done to provide an idea of what to expect from a real data
repository (Figure 2-5).
Figure 2-5: Junk data results for the query made in Figure 2-4.
When the service is completed, the service designers can generate a functional specification which
can inform the Repository Designers about what queries need to be supported (Figure 2-6).
Figure 2-6: Functional specifications for search and retrieval service generated by Pierre.
The Repository Designers then write code for the queries and fulfil the user requirements. Once the
repository has been finished, the service designers can substitute the dummy repository for the live
one (Figure 2-7).
Figure 2-7: A test application generated by Pierre that interacts with a live data repository
When all the designers and the end-users are happy with the service, the service designers can use
the Pierre Configuration Tool to automatically generate multiple front ends which interact with the
same repository (Figure 2-8). These front ends include a command line service; a text-based menudriven application; a standalone GUI application; and a web application.
Figure 2-8: Generating multiple front ends with Pierre.
2.2 Shared Aspects of the Tools
Pedro and Pierre will use the same engine for interpreting XML schemas, but will associate
different properties with each concept:
 Pedro associates concepts with properties of data entry functionality
 Pierre associates concepts with properties of data dissemination functionality
As well as sharing the same engine to interpret schemas, most of the user interfaces for the utility
and configuration tools will be generated using Pedro. For example, the Pedro Configuration Tool
is an instance of a generated Pedro application which is customised with plugins that help Data
Modellers. Pedro’s Meta Data Editor is another generated application which allows Data Curators
to modify the meta-data that is kept for each data file. When Pierre 2.0 is released, the Pierre
Configuration Tool will be another instance of Pedro that is customised to generate data
dissemination services. Figure 2-10 shows the family of the products that will all use Pedro as the
basis for data entry:
Figure 2-10: Tools for the Pedro Project that will use customised versions of Pedro to support data
entry tasks. These include the generated Pedro applications for end-users; the Pedro
Configuration Tool; the Pedro Meta Data Editor; the Pierre Configuration Tool.
Reusing Pedro in these ways helps test the core code base and reduces the amount of new code that
needs to be developed and tested.
Pedro and Pierre share many common form features. They both support field and record-level
validation services. They also use the same system for marking up form fields with ontology terms.
The Pedro Ontology Service Framework is a shared subsystem that allows end-users to mark-up
forms with terms from one or more ontologies. The shared form features mean that the data quality
of query submission will be as good as the quality of data curation.
There are other examples of code being reused in the tool suite. For example, the Pedro Alerts
Editor allows users to associate a set of matching criteria with an intent such as an error or a
warning. The UI for defining the matching criteria is found in two other places:
 the advanced search feature of the Pedro Configuration Tool
 the advanced search feature in the standalone application generated by Pierre.
Another aspect all three tools share is support for both rapid-prototyping and deployment phases of
development. They are designed so that changes in the schema or the service description can be
automatically propagated to the applications.
This allows service designers to rapidly elicit feedback from end-users via auto-generated
applications. Projects can choose to limit their use of the tools to gathering requirements for
software they will create themselves. Alternatively, they can choose to use the generated
applications in a production setting. The use of a model-driven approach allows developers to
control the extent to which they commit to using a new technology.
Although the tools share many features and parts of the same code base, they are intended to be
marketed as independent applications. For example, Pedro does not require users to know about
Pierre. Pierre will rely on code used in Pedro but users are not expected to download Pedro to make
the other two tools work. Marketing the tools independently is another way of allowing developers
to limit their investment in the technology. The limited remit of each tool allows them to be used as
lightweight components in a larger system.
2.3 Using the Tools in Concert
The tools could be used together in a use case scenario that has two phases. During the data entry
phase of deployment, end-users could use Pedro to create XML data sets.
Once the data sets have been created, they can be used to create a data repository. The XML files
could be placed in an XML repository such as eXist. Alternatively, developers could create scripts
which extract specific fields from the data sets to make custom purpose repositories.
This approach could have a number of benefits. First, sensitive data not related to the expected use
of the repository could be left out. Second, the original data sets are preserved, which provides a
form of backup. Third, databases could be heavily optimised for certain types of queries. One
difficulty we observe in bioinformatics repositories is they tend to include a vast amount of
machine-generated data that are relevant for analysis tasks but are not relevant for search queries.
Pierre could be used to provide end-users with multiple front-ends that interact with the same data
repository. The schema used to make a dissemination service would be limited to having those
concepts which can be published. Repository Designers who manage large complex databases
could leave out concepts which didn’t address specific queries, or which only existed to service the
database. For example, the foreign key references in a relational database could be left out of the
schema because they would not be useful query concepts for an end-user.
3 Project Environment
3.1 Project History
Pedro is a software tool which was first released in February 2003 and has been maintained for the
past three years. It is intended to be a model-driven data capture tool that can be used to create data
sets in a number of domains. The tool is a generic software application that can be customised for
domain-specific tasks in a number of ways. It is designed so that much of the data modelling and
documentation can be done by a domain scientist who is not a software engineer. When software
developers are needed to adapt the tool, their efforts can be limited to developing plugins that
support domain-specific activities.
There are a number of historical aspects of design which have helped make the tool suitable for
supporting this use case scenario. They are summarised throughout the following description of the
project’s development history. These points may be helpful in evaluating the suitability of the tool
for a new project setting.
The development of Pedro has been heavily influenced by a community of molecular biologists
who want to standardise the format, structure and content of electronic data sets which describe
their experiments. A growing number of projects in molecular biology are trying to express their
experiment designs in terms of formal data models. Their hope is that making their electronic
records comply with the model will lead to a greater level of uniformity in the data sets that appear
in public data repositories. Creating model-compliant data sets can establish a level of data quality
that makes the files easier to exchange between members of the same lab, members of different
collaborating labs or between members of the broader bioinformatics community. Furthermore, it
is easier for research groups to develop analysis programs that scan data repositories when the data
sets have a common structure.
Throughout 2002, Manchester University was involved with the Proteomics Standards Initiative
(PSI), a consortium of scientists who were developing a community data model that described
proteomics experiments.
Like many standards bodies, the PSI relied on a committee of volunteer participants who met on a
semi-regular basis. They assessed a number of experiment use cases and began to develop a model
that would formally describe them.
Usually, the speed at which standards develop in bioinformatics is much slower than the speed at
which individual labs produce and make changes to data sets. Often, the IT systems used to
manage electronic laboratory data have to be adapted so that they export and import data in a form
that complies with the new community standard. Laboratories tend to have limited access to
software developers and by the time their systems comply with the new model, either the
community standard or the local experiment model has evolved. This makes it difficult to create
data repositories that have uniformly structured data sets.
Manchester’s proteomics community began to feel that for a standard to be widely adopted, there
would have to be evidence that it could be implemented with some kind of data capture tool.
Historical Influence 1: The community of potential end-users wanted software that could
produce data sets which complied with a formally defined domain model.
The development of a data capture tool to suit a complex data model could take a long time to
develop. Furthermore, Manchester’s proteomics group didn’t have much access to software
engineers. Like many other molecular biology labs, they were funded to do research, not
development. Development would be viewed as an overhead in developing a means to an end.
In Autumn 2002, I was working as a contract programmer for the E-Science North West Centre.
The ESNW fostered projects that helped groups apply technology developed at institutions in the
North West. It remains a service-based organisation which tries to provide help to ongoing projects.
One of their services was providing projects with help developing software applications. For the
organisation, developing applications was an end in itself because its focus was on service
provision, not research.
Historical Influence 2: the software project was partly funded by the ESNW, an
organisation whose remit was service provision, not research.
The attitude of the organisation was that the software should support end-user activities. This
helped make the development of Pedro different from other projects I’ve worked on, where the
emphasis was on developing software that minimally met the needs of some research grant.
I was assigned to the Pedro Project in September 2002. By this time, the project manager was
Norman Paton and the main researcher was Chris Taylor. Chris is a geneticist who was heavily
involved in developing models for the Proteomics Standards Initiative (PSI). He was also tasked
with developing the software, but this second task presented two problems:
 He was a domain scientist but not a formally trained software engineer
 His duties for developing standards left little time to develop supporting software
I was brought onboard the project for a period of five months, after which I would be reassigned to
another activity. Chris Taylor had already been developing the PEDRo standard for many months.
During that time, he had also developed a very primitive prototype of what some of the forms might
look like. His work provided a list of requirements gathered over the course of the preceding year.
The clarity of the requirements meant that I did not have to spend the time interviewing clients
myself.
Historical Influence 3: a year of requirements gathering had been done prior to the initial
development of the software tool.
Norman foresaw two things that would happen after I would leave the project which form the next
historical influence:
Historical Influence 4: Pedro would be a tool that would be maintained by domain
scientists who were not trained software engineers. The tool would have to accommodate
frequent changes made to the underlying data model.
The most efficient way to build the tool would be to make a number of bespoke application forms
that supported a snap shot of the model. This would require the least amount of up-front design
work and would yield a working prototype in the shortest amount of time.
However, the most important aspect of the tool appeared to be its ability to be maintained and
accommodate change. There was no way of assessing which part of the PEDRo standard would
change in the coming year, so the application could not attach semantic significance to any of the
model’s concepts. Norman suggested that I make a tool that generated forms based on a formal
data model.
Historical Influence 5: To make the tool easy to maintain, it was designed using a modeldriven approach.
The tool was designed to be independent of the data model it used to generate forms. This had two
consequences:
 It allowed me to develop the tool without first acquiring competence in the proteomics
domain
 It allowed Chris Taylor to develop the data model without requiring much knowledge about
how the tool worked.
This effectively promoted aspects of parallel development for the same software tool. It created a
separation of design concerns which suited the skill sets of a domain scientist and a software
engineer.
Chris Taylor used the XML Schema language to express the domain model. He had to limit the
structures he used because the tool wasn’t able to interpret all of them. However, the result was
adequate enough to present the PEDRo model to end-users. He observed that many scientists were
better at giving feedback on the model via the application forms instead of through a complex UML
diagram. This helped to lower the skill set required by biologists for them to help participate in the
data modelling process.
The first prototype of the tool gained the interest of the MyGrid project. They wanted to use Pedro
forms to create descriptions of bioinformatics services. In particular, they were interested in the
tool’s ability to mark up form fields using key terms from one or more controlled vocabularies. The
MyGrid team made suggestions that led to the development of Pedro’s Ontology Service
Framework. This framework was published in a paper at the European Semantic Web Conference
in 2004.
MyGrid’s involvement represented the first influence of a domain outside of proteomics. Other
groups would express interest in the coming months. Soon, groups In its first year and a half of
release, it was used as a rapid prototyping tool to help validate complex models with end-users.
Historical Influence 6: The model-independent nature of the tool encouraged other
domains to use it. Their feedback helped identify bugs, and led to new features which
helped to service the user community the tool was initially commissioned to support.
Pedro eventually caught the interest of the EBI. They wanted to use the tool with other models that
were too complicated for the tool to handle. To overcome this problem, Kai Runte was hired. He
was responsible for developing a new schema reader which was based on Sun Microsystem’s MSV
schema reader project. Through his work, he compelled me to explain how parts of the code base
worked. This is a vital aspect of developing open source projects that work for other groups.
When he finished developing the schema reader, he tested it on a schema that was auto-generated
from a DTD of the MAGE model. MAGE was a very complex model which described microarray
experiments.
Kai’s effort made Pedro able to support a much broader range of complex data models than what
the tool could support before. This helped make the tool more appealing to other domains. He was
also the first external developer who was invited to make significant changes to the code base. This
is an important event in the development of any open source project:
Historical Influence 7: Pedro’s ability to support other models was greatly improved by the
work of another developer who was not funded by Manchester proteomics group. The
collaboration made the software code base more appealing for open-source project work.
I continued to make fixes to the code base long after my contract to serve the proteomics group had
finished. The ESNW began to recognise that the tool could be applied to other domains, so it
encouraged me to continue my interactions with multiple domains.
Historical Influence 8: the remit of the body funding the software development was broad
enough to allow the tool to be applied and modified to suit multiple domains.
The proteomics groups at Manchester continue to focus their efforts on the development of standard
models. There was a reluctance to deploy the tool in the domain until the model had stabilised.
This meant that the bottleneck for software release was not the development of the software but the
development of a particular model.
These circumstances helped make it acceptable to consult other domains whose models were either
simpler or more mature. The feedback these groups provided helped improve the tool as it would
be used by proteomics scientists.
Historical Influence 9: the proteomics standards took so long to develop that the software
team began to focus on testing the tool on domains which had simpler or more mature data
models.
Eventually people began using the tool for data entry rather than simply as a rapid prototyping tool.
This change in user habits necessitated a superior level of documentation, testing and end-user
training that would make Pedro a production-quality tool. The problem was that being the main
developer on the project, the program made sense to me. Therefore, I would think it made sense to
people using it as well. This is what I refer to as the developer blind spot and it is why core
developers should never be in charge of documenting their own products for end-users.
In 2004, Chris Garwood joined the team and became responsible for helping to make the tool a
product that could be used in day to day activities. He wrote a test plan, rewrote all training
materials and evaluated the tool with end-users. This produced an important separation of roles on
the project which would benefit the people using it:
Historical Influence 10: another software engineer was brought in to make a testing plan,
rewrite training materials and interact with end-users. His detachment from the code base
gave him objectivity in evaluating how well the tool worked for users. It helped eliminate
biases main programmers would exhibit in justifying their work to end-users.
The success of Pedro led to a follow-on project called Pierre. Pierre applied the same model-driven
philosophy to generate forms for search and browse query forms that interacted with a data
repository. Much of the form generation activity was borrowed from the Pedro code base.
Historical Influence 11: The Pierre Project was built using the Pedro code base. This
helped improve the robustness and extensibility of core Pedro libraries.
In the Spring of 2006, Chris fielded a request made by Jennifer Lynch, a mass spectrometer lab
scientist who was working with the Manchester proteomics group. She liked the tool but wanted a
version that would be able to work on some portable computing device. She explained how the
program was difficult to use in the lab because it needed to be installed on a desktop. The
constraints and safety regulations in a molecular biology lab make it difficult to transcribe data
directly to a laptop or a desk top computer.
After three interviews and ten business days, we produced Tablet Pedro, a version of the tool which
would work on a Tablet PC. Over 90% of the Pedro code base was re-used, thus showing the value
of using a model driven approach that could generate forms for different kinds of deployments. The
advent of Tablet Pedro has since attracted greater interest from wet labs and new interest from
research projects that do data entry in rugged outdoor settings.
Historical Influence 12: A lab scientist guided the development of Tablet Pedro, which
could be deployed on a Tablet PC. The development has shown that Pedro can be used in a
laboratory, and it promises to attract the interest of other domain groups who gather data in
remote areas. It also shows the program can be adapted to generate forms for alternate
forms of display.
Since 2004, I’ve been interested in using Pedro to help service research projects in the developing
world. I’ve gathered a number of requirements from interactions with organisations involved with
activities having this theme:
 The tool should support languages other than English
 The tool should support different kinds of form fields which may describe images, audio and
video clips.
 The tool should be documented well enough to not require my involvement
 The tool would have to show greater levels of customisation to suit other domains.
We’ve done a number of things to help meet these requirements:
3.2 Design Forces
This section describes the major design decisions which shaped the development of Pedro. Other
minor design decisions are described under descriptions of subsystems.
3.3.1 Towards a Model Driven Approach
Two major design assumptions motivated us to adopt a model-driven approach to creating a data
entry tool. The first assumption was that the model would continue to evolve rapidly and that all
model concepts were equally likely to change. At the onset of development, there would have been
a high overhead making manual changes in the code to suit a model that was changing every week.
We were concerned that if the initial prototype of the tool underwent too many manual changes, the
end-product would be error-prone, hard to extend and unlikely to perform when it was used in a
production setting. Therefore, this defined our first major design assumption:
Design Assumption 1: the underlying data model will change and all model concepts are
equally likely to change.
The assumption caused us to design first and foremost to accommodate change in the model. This
began a process of decoupling the model concepts from the ways they were visualised. Regardless
of what model concepts were added, deleted or modified, the form fields were rendered in the same
ways. For example, a radio button would always be rendered the same way, an integer field would
produce errors if letters were typed in it and a list field would always have “New” and “Edit”
buttons. Development effort focused on a small collection of form field rendering classes and these
were tested independently of the model.
The second design assumption characterised the developers who would maintain the product in the
future:
Design Assumption 2: the application would continue to be serviced by scarce developer
resources. These people would likely be skilled domain experts but not trained software
engineers.
Applying this assumption helped us gauge the kind of software maintenance activities a typical
developer would be capable of carrying out. We assumed their programming experience would be
limited to the production of scripts that emphasised procedural rather than object-oriented
programming. They would unlikely be able to fix or maintain bugs in the complex collection of
classes. This was especially true of a largely uncommented code base that was the result of an
initial prototype. Instead, the programming efforts would have to be limited to the production of
modules which interacted with the rest of the program via well-defined interfaces. The content of
these modules could retain the procedural style of programming to which they were accustomed.
Design Decision 1: Pedro will be used developed using a model-driven approach.
Model-driven systems run off a model that describes properties of a software application. The
decisions about what properties should be included in the model are at the discretion of the
developers who are creating the systems. We realised we had to strike a balance between including
enough configuration options to make Pedro applicable to a wide range of use cases, yet make it
simple enough to appeal to developers who were not necessarily trained software engineers.
The more configurable an application is, the more use cases it can support. However, increasing the
number of configurable options also increases the learning curve for any developers who use the
system. Using a model-driven system ceases to be feasible if the amount of effort to configure an
application is as much as the effort needed to code one from scratch.
We also had to consider how much developer resources were available to write code that supports
the set of auto-generated features we wanted. Together, these forces motivated us to try and
simplify the application model.
We began by envisioning an idealised system for managing a data repository (Figure 3-1).
Figure 3-1: An example of an idealised system for managing a data repository.
Aspects of the system were evaluated and either removed from the scope of development, hardcoded or expressed as configuration options. The following subsections describe how we tried to
limit the application model.
3.3.2.1 Constraining the Task
The most important way of simplifying the application model is to realise that the tasks of data
capture, data dissemination, analysis and the provision of security services can each be addressed
through separate applications. I speculate that the design of the typical repository shown in Figure
1 is influenced by the following key technology decisions:
 the application is deployed via the web because that medium is most popular for supporting
search and retrieval services
 special technologies are used to enhance the usability of web forms. This is done to help
support data capture activities
 data sets are managed in a data repository to support data dissemination and analysis
activities
 the data repository is usually organised as a relational database to benefit analysis programs
I believe that the decision to support all the major tasks in one application commits each task to
being supported by technologies that are better designed to support other activities. By resolving
the tasks into separate applications, the overall application model is simplified and better
technology choices are made for supporting individual tasks. The separation of tasks is shown in
Figure 3-2:
Figure 3-2: Simplifying the application models by separating tasks
For the development of most repositories, the main tasks have the following ranking from most to
least important:
1) data capture; to populate the repository
2) data dissemination; to search and retrieve parts or all of data sets that match selection criteria
3) analysis; to apply various algorithms to large number of data sets.
The provision of security is usually considered in the initially stages of a project but is left last to be
implemented. Given the limited developer resources initially assigned to the project, the focus of
the tool became data capture. Although the data capture tool would have provisions for plugins
which could support other tasks, the core part of the architecture would not be designed to support
these other tasks.
Task Constraint 1: Pedro will be designed to support data capture tasks. Although it could
have plugins that support other activities, its core architecture will not be designed to suit
other tasks. Other activities such as data dissemination, analysis and the provision of
security services will be dealt with in separate projects.
The application to generate then becomes a data capture tool. Further constraints can be made to
the front end user-interface and the back end storage of data sets.
3.3.2.2 Constraining the Front End
Data repository applications can be deployed in a number of ways, including a web application, a
standalone GUI application and a command-line service. Although the web is a popular form of
deployment for data dissemination tasks, it is less suited to support complex data entry activities.
This is due to the differences in habits between people who produce data sets and others who use
them.
In any given project, there will typically be a small number of data producers and a relatively large
number of data consumers. Data producers will usually work either in the lab which manages the
repository or for one of the lab’s partner organisations. The curators will spend long periods of time
using data capture tools and will value usability more than accessibility in the software they use.
Data consumers will usually be spread out over different locations around the world. They will use
the repository sporadically and will typically spend a brief term trying to retrieve data sets that will
match simple selection criteria. They will value accessibility over usability in the applications they
use.
Front End Assumption: people using data entry tools to record complex data will value
usability more than accessibility.
The design of Pedro’s front-end had to consider usability first and accessibility second. Although
Pedro was going to be developed as a web application, this deployment form was rejected in favour
of a standalone GUI application. Three factors influenced this decision:
 usability
 development time
 performance.
Web forms tend to have poor usability because they are made with a limited collection of simple
form objects such as labels, fields and buttons. Technologies have been developed to enhance the
forms so they are almost as usable as the same forms would be in standalone GUI applications.
However, the enhancements are not universally supported by various browser clients such as
Internet Explorer, Netscape, Mozilla and Firefox. Therefore, relying on these technologies to build
data capture applications would invite platform dependencies that would undermine the web’s
appeal for promoting widespread access to data. To maintain the aspect of universal access, web
technologies for building the GUI would have to render plain HTML forms that would be supported
in all browser client programs.
Web Technology Assessment 1: Web applications developed to promote widespread
access to data should not rely on special technologies for rendering forms. They should use
plain HTML forms that can be rendered by all browser client programs.
During the onset of the project, a number of web technologies were evaluated for creating the front
end of the data entry tool. In bioinformatics, the use of Perl and CGI scripts is popular for making
simple web forms. However, Perl is a language that is best suited for simple scripts. It lacks
object-oriented features that would allow it to support large systems. Instead, Java-based
technologies were considered because the language scaled better as applications became more
complex. Of the Java-based technologies, only those which generated plain HTML forms were
considered. At the end of the evaluation, I decided that the best web technology candidate to use
would be Jakarta’s Struts project. It combined the use of Struts libraries, Java Server Pages (JSP)
and Java Servlets. This approach had a number of advantages:
 the framework supported arbitrarily complex applications better than other technologies
 framework separated the model from view aspects of an application; this suited a modeldriven approach.
 applications rendered HTML pages that could be rendered using any browser client
 it depended on a suite of technologies which were all written using Java.
Overall, this technology would have been the most suitable to use for rendering Pedro as a web
application.
Web Technology Assessment 2: The Jakarta Struts project was the best web
technology evaluated to render Pedro as a web application.
However, the Struts project also had a number of drawbacks:
 each form required screen presentation, action handler and business object layers. While
this approach allowed the framework to support complex applications, it required tedious
programming effort coordinating the layers.
 the applications were difficult to test.
Any schema for auto-generating the application would have to take into account the coordination of
three layers. In contrast, generating an application with Java swing screen objects was far easier to
do and took less effort to design and test. The screen objects were more flexible to configure than
simple web form objects and made the standalone GUI application more usable than a web
application.
Finally standalone GUI applications may provide better performance for end-users. Web
applications are run within browser clients that render data on the screen. Some of these programs
have difficulty rendering large data sets. This is much less of an issue for standalone applications
Front End Decision 1: Pedro will be developed as a standalone GUI application rather
than as a web application.
3.3.2.2 Constraining the Back End
Data entry tools often don’t require the presence of a data repository. Data dissemination services
require one in order to return part or all of data sets that match selection criteria. They usually focus
on the few human-readable fields that describe each data set. Analysis services often require data
repositories to be organised in ways that make data sets more amenable to computationally
intensive operations. They focus on the large volumes of machine-generated data that appear in one
or more data sets.
The needs of data dissemination and data analysis services are not shared by data capture services.
A curator will typically edit only one or a few data sets in one session. Once the data sets are
submitted for publication, they are not likely to be edited again. Therefore, curators who are editing
a single data set do not require access to a large data repository.
Making a data capture tool require the presence of a data repository invites an overhead of
technologies that are designed to suit other tasks. The requirement makes the model-driven
approach more difficult because Pedro would have to generate code in some kind of database
language such as SQL. Installing the generated application could be complicated by the need to
install a database. The applications would also be tied to code library dependencies that were
inherent in whatever database technology was used to manage the data sets.
Instead, Pedro should store a data set as XML documents that can be validated against the domain
model. XML is a data format that is widely used in the bioinformatics. Apart from making a model
driven approach easier to implement, the use of documents frees developers from relying on one
monolithic database.
Programs could be developed which extract field values from a collection of master files and use
them to populate specialised databases. For example, consider how a repository could be designed
to suit a simple search and retrieval service. Only the meta data from each data set would need to
be included in a database. Data sets which match selection criteria could be downloaded by the
users, who could then view them using the data capture tool.
Other use cases may require different parts of the master data files to be extracted. This flexibility
allows developers to create databases that are optimised to suit different tasks. It also allows them
to migrate from using one data repository technology to another.
Developers could write their own services to write data sets to different formats. However, the core
architecture will use XML documents to encode data sets.
Back End Decision: Pedro will store data sets as XML-based documents. Through plugins,
it can support committing data in other ways but the tool will not require the presence of a
data repository.
3.3.2.2 Constraints on Programming Languages and Generation of Applications
Many model-driven systems allow designers to auto-generate application code in multiple target
languages such as Java, C, C++ or Perl.
Java was used to develop the system for a number of reasons:
 developer’s preference: Although I’ve done development in C, C++ and Perl, I had the
most experience with Java. In the initial five month time frame alloted to develop an initial
prototype, Java represented the smallest learning curve to begin coding.
 in-house experience: the two most popular languages used to develop bioinformatics
applications were Perl and Java. Using Java increased the likelyhood that others could
contribute to the project
 scalability: Java is object-oriented and is scalable for large scale systems of classes. I felt
Perl was good for scripts but poor for designing large complex systems.
Language Decision 1: the Pedro system and applications generated from it will be written
using Java.
It seems that it is better to use application models to generate forms at run-time rather than rely on
code-generation facilities. Generating code in multiple languages was not regarded as a priority
because Java code will run and operate on all the machines used by prospective end-users.
Eliminating code generation facilities removes the need to develop code to mechanically generate
Java classes. It also removes the need to have configuration options associated with the activity
appear in the application model.
Auto-generated applications tend to be difficult to adapt or be understood by developers. They
would prefer to extend or modify a well-documented code base that interpreted an application
model at run-time.
Generation Decision 1: Application models will be used to generate forms at run-time
rather than rely on code-generation facilities.
The Pedro Configuration Tool itself represents a data capture tool that uses the application model.
In the initial releases of the tool, applications were configured by manually editing a configuration
file which associated various application properties with concepts in the domain model. However,
over time this model grew more complicated. Eventually we were reminded of the balance point
where it takes as much effort to configure an application as it does to write one from scratch. To
make configuring the applications easier to do, a configuration tool was developed. This is the
MDA design tool but it is generated from an application model that describes all the configuration
features that can be associated with other models.
Generation Decision 2: the MDA Design tool is a data entry application that uses a model
describing configuration properties. The tool will be generated in the same manner as the
other applications it helps create.
4 Pedro Architecture
Pedro uses a model-driven approach to generate generic application features and a collection of
plugins to support specialised features. Application forms are generated based on a data model that
describes the records and fields which may appear in a document. Form data are manipulated by
standard application features or by a collection of plugins supplied by other developers. Figure 4-1
shows the architecture for the tool. The numbers in the diagram label the sequence of events
associated with opening and editing a file. The figure is referenced in the next two sub-sections that
describe the model-driven aspects and service aspects of the design.
Figure 4-1: Architecture for Pedro
4.1 Model-Driven Aspects
The process begins when Pedro interprets a data model. A Schema Reader reads an XML Schema
which describes records and fields that can appear in a document. Although XML Schema is a very
expressive language, there are some application features that it cannot express. Pedro compensates
for this by having a Configuration File which maps schema concepts to various application
properties and services.
Information provided by the Schema Reader and the Configuration Reader are combined to create
templates of native data structures which represent form records. These templates are managed by
RecordModelFactory (1) and are instantiated whenever Pedro needs to create new record objects to
hold data (2).
Pedro serialises its data as XML files which validate against the XML Schema. By default,
documents are saved as *.pdz files. This native file format is a ZIP file containing multiple layers
of information. The application can also import or export the form data as an XML file. This
format is used to produce versions of the document that are submitted to data repositories. When
Pedro reads a document, its I/O routines use the RecordModelFactory to create a tree of data
objects(3). This tree is then passed to form generation facilities (4).
Pedro can generate applications that suit desktop and TabletPC display devices. The form
generators shown in 4-1 use properties of the data objects to render forms and use information held
by the ConfigurationReader to render other aspects of the application.
4.2 Service Based Aspects
The generic aspects of form generation and IO alone would not be sufficient to service many use
cases. To support domain-specific functionality, the architecture supports service interfaces that
can be implemented by developers. The following sections describe the anatomy and categories of
Pedro services.
4.2.1 Service Anatomy
Pedro services adopt a published abstract interface in order to hide implementation details from the
rest of the system. They are characterised by a task, a scope of effect and an aspect of persistence.
Each of these properties effects the way the service behaves in the application.
4.2.1.1 Task
The task describes what the service does. It is expressed to end-users through a name that would be
displayed in a menu or list item, and a description that could be displayed to show more information
about the service.
4.2.1.2 Scope of Effect
The scope of effect determines whether a service is meant to affect the current form field, the
current form and all of its subforms, or the current document. Document-level services are
advertised as menu items in the menu bar of a dialog window. Record-level services are advertised
with buttons that appear at the top of a form. Field-level services are indicated through buttons that
appear at the end of a form field.
4.2.1.3 Persistence
Services are considered persistent if one service instance is used for all requests and transient if a
new service is instantiated for each request. Transient services would tend to be those which are
simple and use little computational resources. Instantiating a new service object for each request
has the benefit of reducing undesirable side effects of other program variables. Persistent services
tend to be services which either use significant computational resources or which are meant to
exhibit a “memory” of user activities throughout a user session. For example, if a service needed to
start up a database, then it is better if the system assumes it is persistent to reduce the startup
overhead associated with multiple requests.
4.2.2 Access to Application Variables
Services can manipulate the host application in a number of ways. They can access three
collections of variables known hereafter as contexts:
 Application context which refers to program objects that apply to all documents being
managed by the current application. eg: schema definitions
 Document context, which holds objects that affect a single document, eg: the current form,
or a tree widget that shows end-users how all the form records are organised.
 Form context, which holds objects that affect the current form, eg: the name of a currently
selected field.
The variables maintained by these contexts allow service designers to customise the UI components
and the three kinds of scope limit undesirable side effects. For example, designers could use form
context to help highlight the currently selected field in blue. However, in using the form context,
the change in colouring would not affect the same field as it may appear in other open documents.
4.2.3 Service Types
Pedro supports specialised and generalised services. Specialised services use abstract interfaces
that are associated with specific kinds of data entry tasks that were identified as being common to a
number of bioinformatics applications. These services address issues such as the generation of
unique key identifiers, the mark up of form fields with controlled vocabulary terms and the
validation of form data. General services are intended to support all other kinds of tasks that are
required to customise the generic aspects of form generation. Whereas general services are always
triggered through an explicit action from end-users, some specialised services may also be triggered
as part of programmatic activities. The following sections describe the kinds of specialised services
that are supported by the architecture.
4.2.3.1 General Services
General services implement a general plugin interface that is described in later sections. Typically
general services will be used to manipulate the record tree representing the data. However,
developers can access variables in the Contexts to effect other parts of the application. General
services will always be explicitly activated by users by way of pressing a button or menu item.
4.2.3.2 Specialised Services
Specialised services adopt interfaces support specific data entry tasks. Whereas general services are
always triggered through an explicit action from end-users, some specialised services may also be
triggered as part of programmatic activities. The following sections describe the services supported
by the SOA.
4.2.3.2.1 Validation Services
There are three types of validation services that check the correctness of record data. Field level
validation services validate the content of a particular field. For example, they could check that the
value is a legal float number or that it matches a particular regular expression pattern. These
services would be triggered whenever the user tried to commit changes to the current record.
Record level validation services detect incorrect combinations of field values within a given
record. For example, suppose a form had fields for “gender” and “cancer_type”. A record level
validation service could be developed to detect an error for a “male” who had “cervical cancer”.
Like field level services, these services would also be triggered when the user tried to commit
changes to the current record.
Document level validation services detect data entry errors in the entire document. These services
could identify patterns in one record that don’t fit given the patterns of values found in another kind
of record. For example, a document describing an experiment could have records describing a lab
protocol that were inappropriate given the kind of sample described in another record.
Document level validation services would be triggered either when users explicitly wanted to view
errors in the current document or when they attempted to export it to some final submission format.
In the latter case, the presence of errors would prevent users from sending final draft documents to
data repositories. This feature would help ensure that submissions had a high level of data quality.
4.2.3.2.2 Ontology Services
An ontology service allows end-users to mark up a form field with terms that come from a
controlled list of terms. Although its scope of effect is a single form text field, it may use other
information about the current form, the user or document to help constrain the choices of terms it
provides to the viewer. This kind of service is described more in Chapter 6.
4.2.3.2.3 ID Generator Service
This service generates a unique key value for a form text field. Whereas ontology services help
provide values that carry semantic significance, ID generator services simply provide keys to
uniquely identify a record with respect to other records in the same document or perhaps even
records within other documents. For example, they could be used to uniquely tag samples in an
experiment. Implementations of the ID Generator Service interface could provide a key that did not
already appear in a database or records.
5 Description of Subsystems
Section 3 provided a high-level overview of the architecture for Pedro. The following sections
cover aspects of the design in far greater detail.
5.1 Schema Reader
5.1.1 Purpose
The schema reader is a Java class that implements “SchemaReaderInterface”, and is responsible for
using XML schema properties to create definitions of template record definitions. These definitions
are instantiated whenever files are read or when the users create new record objects in a data set.
5.1.2 Description
Pedro uses a schema reader that interprets model properties and uses the values to create templates
of native data structures that are instantiated to hold data. The application interacts with an
interface “SchemaReaderInterface” which can be implemented to interpret XML Schemas or other
types of models. Currently, the main schema reader class is
pedro.mda.schema.MSVSchemaReader.
Every schema reader follows the same algorithm:
1. extract model properties
2. use properties to set attributes in classes used to produce template records (eg:
EditFieldModel, ListFieldModel, RecordModel)
3. set additional attributes of template records using properties read from the Configuration
Reader
4. submit template record definition to RecordModelFactory
5.1.3 Design History
Pedro’s schema reader has been reworked twice since its initial release. The first version used a
DOM parser to scan for particular structures in an XML schema. The decision to have Pedro rely
on its own native data structures instead of generic ones like DOM objects helped insulate it from
the implementation details of specific kinds of model parsing technologies. The model parser could
be substituted provided it could provide enough properties to create template record definitions.
This became an important benefit when I later encountered a company which used a different kind
of data modelling language than XML schema.
In 2003, I met representatives from Epistemics (http://www.epistemics.co.uk), a company that had
developed knowledge acquisition software. PCPack allowed data modellers to graphically model
knowledge that they elicited from domain experts during structured interviews. Their software was
being used in the aerospace industry to model the relationships and properties of various aircraft
parts. Although their software supported data modelling, it did not have a feature which could
generate prototype software applications from the models.
We expressed interest in collaborating with one another. I realised that in order to accommodate
using both an XML schema reader and a schema reader which read their own bespoke XML file
formats, I had to develop an abstract interface called “SchemaReaderInterface”. Pedro’s schema
reader was reworked so that the rest of the program communicated with it via an interface rather
than with a specific implementation.
A schema reader was successfully developed to interpret certain kinds of models developed in
PCPack. Once again, the procedure was to interpret the schema once and use the model
information to produce templates of Pedro’s native data structures.
Due to resource constraints, the collaboration didn’t proceed further. However, the work allowed
an interface to be developed in a generic way that could accommodate other implementations of the
schema reader.
In 2004, the EBI’s Kai Runte offered to rewrite the schema reader. This time, the schema reader
would rely on Sun Microsystem’s MSV Schema Reader. MSV was designed to parse a wide
variety of schemas and provide information about them in syntax trees. The algorithm for creating
the templates was the same, but a wider range of schemas could be supported.
The MSVSchemaReader developed by Kai was tested on the MAGE-ML XML Schema, which had
been mechanically generated from a DTD definition. We were able to load legacy MAGE-ML data
files successfully and show that data files could validate against the schema.
MSVSchemaReader
has been used for the last couple of years to drive all the other Pedro products.
There are some deficiencies with it. Although the class clearly uses a Visitor pattern, the code is
very complicated. The schema reader only interprets about 11 out of 44 possible XML schema
structures, so it remains unable to understand a variety of models developed outside Manchester
University.
Kai left the project in 2006 and since then we’ve had no more in-house knowledge about how it
works. We know that it *does* work, but we have been reluctant to change anything because so
much of the code base relies on Pedro’s template record definitions.
By the Summer of 2006, it was clear that the limitations of the schema reader presented a barrier to
widespread uptake of the tool in other communities. The development team recognised that
although a great variety of models could be accommodated by the tool, the schema reader needed to
be enhanced.
A decision was substitute rather than enhance the MSVSchemaReader class. Although it is a stable
product, MSV appears to be maintained by one developer and it now seems a bit dated. We felt it
was a strategic gain to modify Pedro so that it worked with a more modern schema reader
technology.
In the Spring of 2007, Chris Garwood evaluated a number of different schema reader technologies.
We’re currently in the process of trying to replace the MSVSchemaReader class with one that uses
Castor.
Castor generates Java class definitions based on record definitions provided in an XML schema.
Using Castor with Pedro will require that the schema reader use Java’s Reflection facilities to
interpret properties of generated classes. These properties will be used to create template record
definitions in the same way done with the previous efforts to write a schema reader implementation.
XML Schema properties which are used to set attributes in Pedro’s native data structures are
described in detail in Section 5.2.
5.1.3 Scope of Effect
The Schema Reader is only used once when an application starts. Its job is to help create templates
of Pedro’s native data structures. These templates will be instantiated to hold the information from
a data set. Replacing the schema reader will require rigorous unit testing because of the potential
for errors when the schema reader uses model properties to set attributes in the template record
definitions. The activity lends itself to automated testing and will not require testing of the
application itself. For example, test cases could run the schema reader and then perform JUnit test
cases that verify that records have the expected number of type of fields.
5.1.4 Relevant Code Packages
This subsystem depends on the following packages and classes:



pedro.mda.schema.*
pedro.mda.config.*
pedro.mda.model.*
Currently, the entire suite of Pedro tools depends on the work of
pedro.mda.schema.MSVSchemaReader, which implements
pedro.mda.schema.SchemaReaderInterface. The schema reader interprets schema
properties and sets attributes in templates of ListFieldModel, EditFieldModel,
AttributeFieldModel, RecordModel, RecordModelReference, which are all defined
in the pedro.mda.model.* package. This package also contains RecordModelFactory,
which is where the tempates are registered.
MSVSchemaReader is called within pedro.mda.schema.Startup, a class that is used in
every model driven tool in the Pedro tool suite.
5.2 Native Data Structures
5.2.1 Description
The basic native data structure in Pedro is the RecordModel class, which comprises a number of
DataFieldModel objects. Figure 4-1 shows the different kinds of data fields that can be
created:
Figure 4-1: The inheritance hierarchy for data field classes supported in Pedro.
All fields will have properties such as:
 a field name
 a help link for a URL
 whether the field is required or optional
 the kind of field view type (eg: “RADIO_FIELD”, “DATE_FIELD” etc)
 whether the field is an attribute or not (this probably belongs in EditFieldModel class)
 text to appear when an end-user hovers over the field label
The two kinds of fields are EditFieldModel which manage a single value and
ListFieldModel which contain one or more RecordModel objects. EditFieldModels will
have:
 a value represented as a string
 a default value that should appear when forms are rendered with a new record model object
 whether the field value should be included as part of the display name that represents the
containing record.
ListFieldModel objects will know what kinds of record models they can contain and will have
a collection of children RecordModel objects.
There are four kinds of EditFieldModel subclasses, although they have few extra properties. A
GroupFieldModel is a kind of EditFieldModel where end-users select a value from a list.
Its subclass BooleanFieldModel constrains this list of choices for “true” and “false”.
TextFieldModel is a marker class for identifying fields that can be associated with ontology
services or id generation services. An IDFieldModel is a kind of TextFieldModel which
also has an IDGeneratorService. This service creates an identifier value that can be inserted
into the form field. An IDFieldModel is the data container object that corresponds to an
attribute field in XML Schema.
A RecordModel is a collection of DataFieldModel objects. Figure 4-2 shows how the
RecordModelFactory, RecordModel, EditFieldModel and ListFieldModel
classes relate:
Figure 5-2: Aggregation relationships for Pedro native data structures
A RecordModel will contain a collection of EditFieldModel objects and a collection of
ListFieldModel objects. Each ListFieldModel can contain multiple RecordModels.
When the SchemaReader is operating, it adds RecordModel instances to the
RecordModelFactory. The instances act as templates that can be cloned whenever a new
record model is needed for a data set.
The most popular utility class in the pedro.mda.model.* package is RecordModelUtility. It
has a number of methods to help group fields in different ways.
There is a strong relationship between properties of the data model and properties of the native data
structures. The following tables list the attributes of native data structures that should be set with
values derived from an XML schema.
5.2.1.1 RecordModel Properties
Property
record_class_n
ame
helpLink
Description
the name of the record
Property Provider
XML Schema Reader:
<xs:element
name=”[record_class_name]”>
<xs:complexType><xs:sequence>...</xs:
sequence></xs:element>
the URL for a web page that
describes the form concept
ontology_ident
ifier
a unique identifier that can be
associated with a schema
concept. This is useful if
ontology services want to use a
schema concept’s ontology
identifier to help limit what
values are presented to the
end-users.
form_comments
comments that appear on the
form and describe the schema
concept
tool_tip
text that appears when endusers let their mouse cursor
hover over a field label.
recordValidati collection of descriptions of
onServices
record-level validation services
Pedro Configuration Tool; also see
pedro.mda.config.SchemaConceptConfigurati
on
Pedro Configuration Tool; also see
pedro.mda.config.SchemaConceptConfigurati
on
Pedro Configuration Tool; also see
pedro.mda.config.SchemaConceptConfigurati
on
Pedro Configuration Tool; also see
pedro.mda.config.SchemaConceptConfigurati
on
Pedro Configuration Tool; also see
pedro.mda.config.RecordConfiguration
5.2.1.2 DataFieldModel Properties
Property
name
Description
name of the field
Property Provider
XML Schema Reader:
<xs:element name=”[name]” .../>
<xs:element ref=”[name]” .../>
for edit fields
or
<xs:group ref=”[name]”.../>
isRequired
determines
whether a field is
optional or
required
for list fields.
Note that for list fields, the name is the record class
name of another record structure or the name of a
group of record class names
<...minOccurs=”0”.../> means the field is
optional.
<...minOccurs=”1”.../> means the field is
required.
helpLink
fieldViewType
the URL for a
web page that
describes the
form concept
gives an indicator
of how the field
should be
rendered.
Pedro Configuration Tool; also see
pedro.mda.config.SchemaConceptConfiguration

type=”xs:string”
indicates the field type is
TEXT_FIELD.







type=”xs:date”
indicates the field type is
DATE_FIELD. The field label will be
followed by the date format pattern enclosed
in parentheses.
if a field has at most three restriction values,
the field type will be a RADIO_FIELD. The
field will be rendered with radio buttons. If
there are more than three restriction values,
the field type will be
COMBINATION_FIELD. The field will be
rendered with a dropdown list of choices.
type=”xs:anyURI” indicates the field type is
URI_FIELD. The field will be rendered with
a browse button that allows end-users to
search for a file.
<xs:attribute.../> indicates the field type
will be an ID_FIELD. In the main Pedro
form, attribute fields are shown first, followed
by all the other fields. An ID_FIELD will
also have a “Generate Key” button that users
can press to generate an identifier value.
<xs:element ref=”..”...maxOccurs=”1”../>
indicates a field will be
ONE_TYPE_ONE_VALUE_LIST. This
form field will have a desensitised text field
and have a “New” and “Edit” buttons.
<xs:element ref=”..” ...
maxOccurs=”unbounded”../> indicates a field
will be ONE_TYPE_N_VALUE_LIST. This
form field will have a scrollable list showing
display names of sub-records. It will also
have “New”, “Edit” and “Delete” buttons.
<xs:group
ref=”..”...maxOccurs=”1”../> indicates
field will be a N_TYPE_ONE_VALUE_LIST.
The form field will have a desensitised text
field and a combination box that lets the user
choose which type of record to create.

<xs:group ref=”..”
...maxOccurs=”unbounded”../>
indicates a N_TYPE_N_VALUE_LIST
field
view type. The form field will have a
scrollable list showing display names of sub-
a
records. It will also have a combination box
that lets the user choose which type of record
to create. When users select a type of record,
the list filters to show records of that type.
ontology_identifier
form_comments
tool_tip
a unique identifier
that can be
associated with a
schema concept.
This is useful if
ontology services
want to use a
schema concept’s
ontology
identifier to help
limit what values
are presented to
the end-users.
comments that
appear on the
form and describe
the schema
concept
text that appears
when end-users
let their mouse
cursor hover over
a field label.
Pedro Configuration Tool; also see
pedro.mda.config.SchemaConceptConfiguration
Pedro Configuration Tool; also see
pedro.mda.config.SchemaConceptConfiguration
Pedro Configuration Tool; also see
pedro.mda.config.SchemaConceptConfiguration
5.2.1.3 EditFieldModel Properties
Property
defaultValue
allowFreeText
scrollingTextFi
eld
editFieldValida
tionServices
isDisplayNameCo
mponent
Description
the default value that should be
displayed whenever a new record
containing this field is displayed.
determines whether a text field
entry can accept free-text entries;
this is not applicable for fields
that have drop-down lists or radio
buttons.
determines whether a text field is
displayed with one line of text or
a scrolling text area
validation services that can be
applied to the value held by the
edit field model
determines whether the field is
used to derive the name of the
containing record
Property Provider
Pedro Configuration Tool; also see
pedro.mda.config.EditFieldConfigu
ration
Pedro Configuration Tool; also see
pedro.mda.config.EditFieldConfigu
ration
Pedro Configuration Tool; also see
pedro.mda.config.EditFieldConfigu
ration
Pedro Configuration Tool; also see
pedro.mda.config.EditFieldConfigu
ration
Pedro Configuration Tool; also see
pedro.mda.config.EditFieldConfigu
ration
units
fieldValidation
ServiceConfigur
ation
ontologyService
Configurations
editingComponen
tClassName
the units associated with a field;
typically only of use for numeric
fields
a collection of descriptions of
field validation services
Pedro Configuration Tool; also see
a collection of descriptions of
ontology service descriptions; this
is only applicable to text fields.
the class name of an editing
component; this should
desensitise the part of the form
field that holds a value. Instead,
users click on “Edit” to invoke a
separate editing component
Pedro Configuration Tool; also see
pedro.mda.config.EditFieldConfigu
ration
Pedro Configuration Tool; also see
pedro.mda.config.EditFieldConfigu
ration
pedro.mda.config.EditFieldConfigu
ration
Pedro Configuration Tool; also see
pedro.mda.config.EditFieldConfigu
ration
5.2.1.4 IDFieldModel Properties
Property
idGeneratorService
Description
generates an
identifier value
that Pedro
inserts into the
text field
whenever the
“Generate Key”
button is
pressed
Property Provider
Pedro Configuration Tool; see also
pedro.mda.config.AttributeFieldConfiguration
5.2.1.5 GroupFieldModel Properties
Property
choices
Description
the choices provided in a
drop down list
Property Provider
XML Schema Reader:
<xs:restriction base=”xs:string”>
<xs:enumeration value=”value1”/>
<xs:enumerationvalue=”value2”/>
</xs:restriction>
5.2.1.6 BooleanFieldModel Properties
Property
choices
Description
the choices provided in a
drop down list
Property Provider
XML Schema Reader:
type=”xs:boolean”
BooleanFieldModel, a subclass of
GroupFieldModel, forces the choices to
be “true” and “false”
5.2.1.7 ListFieldModel Properties
Property
Description
a collection
of list field
validation
services
listFieldEditingComponentConfigu a collection
rations
of
descriptions
of
components
that can
create or edit
different
kinds of
records in a
list field.
fieldValidationServiceConfigurat
ions
Property Provider
Pedro Configuration Tool; see also
pedro.mda.config.ListFieldConfigur
ation
Pedro Configuration Tool; see also
pedro.mda.config.ListFieldConfigur
ation
5.2.2 Design History
During the onset of the project, there was a great temptation to make Pedro hold all its data in DOM
objects. This was because the main I/O routines were written using the DOM parser and the result
of the activity was a complete in-memory tree of DOM objects. There could have been some
benefit in having Pedro rely on a generic data structure that was used in other projects. However,
the DOM object model had shortcomings. The generic data objects had generic means of accessing
and changing data. The task of finding specific fields within a record was cumbersome and
required a great deal of looping constructs. The cumbersome nature of the DOM API led me to
develop a collection of native data structures that could hold better cater for operations supported by
the application.
This proved to be a good decision because Pedro later failed to perform satisfactorily when it loaded
large data files. Performance improved dramatically when the file reading routines switched from
using the DOM to SAX parser. The SAX parser did not produce DOM objects, so it would have
not made sense for the rest of the code base to depend on the generic data structures.
Initially the native data structures held information about both the model and view aspects of
records and fields. This later caused severe performance problems because Swing-based field
views were created for each field whether they were being viewed or not. The structures were later
reworked in a way that rigidly separated model and view aspects. View components were only ever
generated for fields in the currently displayed record.
The next problem with the native data structures related to serialisation. The copy and paste feature
in Pedro required that all objects and the objects they referenced were serialisable. This worked
well until serialisation encountered things like ValidationService and OntologyService objects.
These were interfaces that could be implemented as Java classes that themselves were not
serialisable. I attempted to require that all services were serialisable but then decided to strip native
data structures of references to services. The only remaining service that exists as an artefact is the
way EditFieldModel references validation services.
In the revised approach, the form generation facility would receive a form field, look up properties
in the configuration reader, and instantiate services when the fields were actually displayed on the
screen.
Still, many of the native data structures had too much code. Many of them such as RecordModel
had a number of utility methods which did things like identify different groups of fields. This
functionality was migrated to a new class called RecordModelUtility.
The result is that the data structures now hold mostly data and not information about views and
services. The exception is how DataFieldModel knows about a FieldViewType. This type is
something that is determined by the schema reader, and is used to encode rendering hints into the
model object. For example, the schema reader can tell whether a list field can support one or
multiple types of sub-records. It marks the model object with field view types such as
“ONE_TYPE_ONE_VALUE_LIST”, and “N_TYPE_ONE_VALUE_LIST”. Pedro’s
pedro.desktopDeployment.FieldViewFactory uses this field type value to determine what kind of
form field to create for visualising the field data.
5.2.3 Scope of Effect
Pedro’s native data structures are used ubiquitously in all tools in the Pedro suite.
5.2.4 Relevant Code Packages
All of the model classes are defined in the pedro.mda.model.* package.
5.3 Pedro Contexts
5.3.1 Purpose
to provide an extensible means for services to use parameters values that come from the Pedro
application or from other software applications.
5.3.2 Description
Pedro manages a number of global variables through the use of three classes:
PedroApplicationContext, PedroDocumentContext and PedroFormContext. These classes are
types of HashMaps which have a predefined set of keys representing various objects in a Pedro
application. All service interfaces supported in Pedro allow developers to access these objects.
Figure 4-3 illustrates the way the context objects relate to one another.
Figure 4-3: The relationships among Pedro context classes
The PedroApplicationContext defines a number of keys which refer to objects that apply to
all dialogs. For example, PedroApplicationContext.RECORD_MODEL_FACTORY is a key
that is associated with the RecordModelFactory object. The same factory object will be used
regardless of which service in which window is using it. It would be called from within a service as
follows:
RecordModelFactory recordModelFactory
=(RecordModelFactory)pedroFormContext.getProperty(PedroFormContext.RECORD_MODEL_
FACTORY);
Other objects have a scope limited to a single dialog.
PedroDocumentContext.NAVIGATION_TREE refers to the NavigationTree object that
displays records in the left part of a Pedro Dialog. A service could access the NavigationTree
object as follows:
NavigationTree navigationTree
= (NavigationTree)
pedroDocumentContext.getProperty(PedroDocumentContext.NAVIGATION_TREE);
PedroFormContext holds references to objects that relate to the currently displayed record.
The key PedroFormContext.CURRENT_FIELD refers to the currently active field in the form.
5.3.3 Design History
The development of Pedro Contexts was in response to the effect that ad-hoc development requests
had on the collection of services that were supported by the tool. In some cases, a service needed to
have access to another part of the application or some additional value. Delivering an extra
parameter value to the service often required that the value was passed along through a delegation
chain of objects that had nothing to do with the service operation. This caused parameter bloating
in the methods of many classes, especially GUI-based classes.
Moreover, because different services had slightly different parameters, it wasn’t possible to take
advantage of common code properties. The more similar services are to one another, the easier it is
to write code which can service all of them.
All of the Pedro services needed an extensible way to use new kinds of information that could be
supplied by objects in the Pedro application or those produced in other systems. Hence, all services
were reworked so that they expected context objects.
In future, Pedro will support key values for objects that come from third party software products.
Data modellers will be able to add more properties using the PedroConfigurationTool.
5.3.4 Scope of Effect
In the Pedro 2.0 code base:
 PedroFormContext appears in 142 source files
 PedroDocumentContext appears in 35 files
 PedroApplicationContext appears in 70 files.
These classes are referenced extensively in other tools from the Pedro Project such as Pierre.
Adding a new key in any of the contexts will make it accessible to all services.
5.3.5 Relevant Code Packages
Pedro’s context classes are defined in the pedro.system.* package.
5.4 Validation Services
5.4.1 Purpose
to provide field-level, record-level and document validation services.
5.4.2 Description
Pedro supports validation services which can effect a field, record or document. Field and record
validation services are triggered whenever the end-user attempts to commit changes to the current
record. The exceptions are field-level validation activities that check whether a required edit field
has a value or if a required list field has one child record.
In the Pedro tool, document-level validation services are only triggered when end-users try to
export a data set to a final submission format or when they use the “Show Errors” feature in the
View menu.
Field-level validation services are intended to identify problems in the value of an edit field or with
the composition of child records found in a list field. Record-level validation services are intended
to identify field values which are legitimate when considered in isolation but are wrong when
considered in combination with other field values. For example, a form could have fields such as
“cancert_type=ovarian” and “gender=male” which form an illegal combination of values.
Document-level validation services are intended to identify errors that appear in disparate parts of
the same data set.
The main validation utility used in Pedro is
pedro.soa.validation.ValidationFacility, which can be used to validate a field,
record or document. The class has options for including or excluding certain types of errors from
the validation activity.
The pedro.soa.validation.* package has a small hierarchy of interfaces shown below:
Figure 5-4: the inheritance hierarchy of validation service classes
The top level class is pedro.soa.ServiceClass, which provides methods for setting and
getting parameters. It also has a method for setting the resource directory, which is the default
directory where files are expected to be found. Typically this will be the
./models/[project]/resources directory found in each model folder.
The code base needs to be corrected so that all three of document, record and field level validation
services subclass from ServiceClass. All the other validation service classes have methods that
reference “pedroFormContext”. This is a HashMap that references a wide range of objects that are
part of the Pedro application. For example, pedroFormContext has references to the
RecordModelFactory, the NavigationTree and the currently displayed RecordModel. The
HashMap allows services to use other parts of the application to inform how it proceeds with
validation.
DocumentValidationService has a method which expects the root record model in a data
set. RecordModelValidationService has a similar method but it merely expects the
current record model to be passed to it. FieldValidationService has fields for setting the
field name and whether the field is required or not. The required field setting determines whether
field validation services check that a field is empty or not.
EditFieldValidationService is the same as ListFieldValidationService,
except that the former expects a String field value and the latter expects a ListFieldModel
object.
Most of the classes in pedro.soa.validation.* perform type-based error checks on fields. Subclasses
of AbstractEditFieldValidationService check for double, integer and float type
errors. Each of these services also has “bounded” versions which consider lower and upper limit
values. Figure 4-4 shows the variety of field level validation services that are automatically
associated with the data type of a field specified in the XML schema:
Figure 5-5: inheritance hierarchy of default edit field validation services that provide basic type
checking capabilities.
The DateValidator class is special in that it relies on a static regular expression value for the
date format. The value can be defined in the Pedro Configuration Tool. Once the value is set, all
instances of the DateValidator check that a date value matches the specified format.
The StringMaskValidator class is used to validate field values against a regular expression
that is defined as a restriction in the XML schema. This provides a powerful way of validating
form fields, such as constraining values to a certain number of characters or requiring values to
have a particular prefix or naming convention.
Many of the validation classes implement a ConstraintDescription interface. This is used
by some Pedro project tools such as Pierre to obtain a human-readable description of what a
validation service does. This text is included in auto-generated functional specifications.
In order to minimise the number of times validation services are instantiated, the
ValidationServiceRegistry manages all services created in the system. Service designers
can use the Pedro Configuration Tool to specify whether a service should be persistent or transient.
The ValidationServiceRegistry uses this configuration setting to determine whether it
creates a new instance of a validation service each time it is asked, or if it returns the same instance
of a service each time.
Validation facilities in Pedro are extended by the Pedro Alerts System described in Section 4.10.
To allow validation services and alerts to work together, the “validate” method for all validation
services returns a collection of Alert objects which could represent errors or warnings.
5.4.3 Design History
Originally, all the validation services dealt with errors in edit fields. As the project progressed, the
interfaces for various services became more complicated. For Pedro 2.0, all of Pedro’s service
classes were overhauled so that services had a more consistent API. One of the most important
changes was allowing service classes access to a wide number of application objects that might help
guide their activities. More of this is covered in the discussion on Pedro Contexts in Section 4.3.
5.4.4 Scope of Effect
The current schema reader associates most type checking services with record fields.
Record and field validation services are activated in the following cases:
 the “Keep” or “Done” button is pressed on the main form
 the “New” or “Edit” button is pressed in the list fields of the main form
pedro.desktopDeployment.RecordView and
pedro.desktopDeployment.ListValueButtonPanel both show examples of enacting
validation actions.
Document level validation services are triggered when users try to use the “Export to Final
Submission Format” button in the File menu or when they try to use the “Show Errors” button in
the View menu. Code that calls validation services can be found in
the FileMenu and ViewMenu classes found in both the pedro.desktopDeployment.* and
pedro.tabletDeployment.* packages.
5.4.5 Relevant Code Packages
Most classes related to validation are defined in the pedro.soa.validation.* package. Other classes
appear in pedro.soa.alerts.*. Classes that call validation services will appear in the FileMenu,
ViewMenu, RecordView and ListValueButtonPanel classes of the pedro.desktopDeployment.*
package. The pedro.tabletDeployment.* packages contain similar classes.
5.5 Ontology Services
5.5.1 Purpose
provide a system which allows end-users to mark-up form fields using terms from multiple
ontology services.
5.5.2 Description
The Pedro Ontology Service Framework manages ontology services which collect and render
ontology terms for end-users in some kind of display.
5.5.2.1 Basic Data Structure: Ontology Term
The basic unit of information handled by the services is the OntologyTerm, which comprises a
label, a unique identifier and a collection of related terms (Figure 4-6).
Figure 5-6: ontology term data structure. All ontology terms can be represented by the
OntologyTerm class and those which can be ordered as a tree of concepts can be represented by
TreeOntologyTerm as well.
The label represents the word phrase that would be presented to an end-user in a display. The
unique identifier represents the concept referred to by the term. For example, “cat” and “le chat”
are word phrases for the same concept ‘cat’. The concept could have an identifier such as
“www.dictionary.org/cat_01”. Humans will relate to the label whereas software agents that manage
the ontology will relate to the unique identifier.
The collection of related terms does not refer to a specific type of relationship such as “has a”, or “is
a”. The way to determine the kind of relationship of related terms is covered later in this section.
TreeOntologyTerm is a subclass of OntologyTerm which also has a notion of a parent term.
It is used to support ontologies that are structured as a tree of concepts.
5.5.2.2 Ontology Provenance
Pedro uses OntologyTerm as a lightweight data container for holding information about
ontology terms. However, when users select and use terms, Pedro attempts to find out more
information about them. In the sections covering OntologySource and OntologyViewer,
there are interface methods for “getOntologyTermProvenance”. These methods cause a
source or viewer agent to return data that describes more details about a given ontology term.
OntologyTermProvenance contains information described in the SKOS standard. Most of
these details are probably only meant to be processed by software agents.
Pedro saves the meta data about selected ontology term in its *.meta data layer (See Section 4.9).
The provenance information is important in cases where an ontology term has been reclassified in
the same ontology, or if a term has been deprecated.
5.5.2.3 Ontology Services
The Pedro Ontology Service Framework can associate one or more ontology services with a form
field. An OntologyService will have at most one OntologySource and one
OntologyViewer (Figure 4-7). It can have one, the other or both of these kinds of objects.
Figure 5-7: an ontology service, which comprises at most one ontology source and one ontology
viewer.
An OntologySource is an agent that provides ontology terms to the system. It is an interface
which is intended to hide implementation details of how terms are read from a storage medium. For
example, the terms could be a list of words in a simple text file; a bunch of rows in a relational
database table; or tag values in some XML-based file. The terms could also be managed locally or
remotely. The interface is designed to shield the rest of the application from these data
management details.
An OntologyViewer visualises ontology terms for end-users and is designed to accept
OntologyTerm objects provided by an OntologySource. A viewer can present data in a
number of ways such as a simple list, a table, a graph, an image map or a collection of images.
Pedro’s use of this interface allows it to be insulated from details of how terms are rendered for the
end-users. The main purpose of the viewer is to provide the system with a collection of selected
terms that can be inserted into a form field.
If a service has both a source and a viewer specified, then terms provided by the source are given to
the viewer to render in some kind of display. If only a source is provided, then Pedro associates it
with its own default ontology viewer. If only the viewer is provided, then Pedro assumes the agent
will combine responsibilities of reading and presenting ontology terms.
The flexibility of mixing and matching source and viewer components is meant to make it easy to
integrate legacy components. With this scheme, a developer can wrap the part of an application that
parses terms; the part that views terms or both. The same application can be wrapped as a source
and a viewer, allowing the same component to be marketable for use in other ontology services.
There are a couple of reasons why developers may want to wrap just a viewer. Sometimes the
legacy application may not have a rigid separation between its model and view aspects. In this
case, the source may somehow be tied to graphical objects even though its job does not include
rendering activities. The viewer could require specialised parser routines that contain information
which is non-compatible with the OntologyTerm objects provided by some other source.
Another reason is to allow the ontology viewer to take advantage of implementation details in the
ontology source. A source hides most of its implementation details from an OntologyViewer.
The viewer accesses information about the ontology via the methods in the interface used by the
source. In some cases, developers may want to expose rather than hide certain formalisms. For
example, an OWL-based ontology can support various logical arguments and can support a variety
of complex relationships. It may be desirable for expert end-users to have more features in the
viewer which take advantage of ontology terms as they are expressed in OWL rather than from the
term objects provided by a source.
5.5.2.4 Ontology Source
Figure 5-8 describes the OntologySource in detail, as well as showing some of the default
ontology sources that come bundled with the application. Most of the methods in
OntologySource take an argument “pedroFormContext” which is an instance of the class by the
same name. PedroFormContext is a HashMap that allows developers to reference parts of the
Pedro application from within the service. It is described in more detail in Section 4.3.
Table ZZZ describes the major methods of the OntologySource interface:
Method
isWorking
getOntologyTermProvenance
containsTerm
Description
a diagnostic method used by Pedro to determine whether
a source is fit to use or not. The most likely causes of a
failure in the source is that it can’t find some resource
file it’s looking for or it can’t connect to the Internet.
The result of this method can determine whether an
ontology service is listed for the end-user to use.
the way the source provides provenance data for a given
ontology term. This method is called when users decide
to select a term in the viewer. The viewer attempts to
capture all the information about the term so it can be
included in Pedro’s meta data file (see Section 4.9)
designed to let the OntologySource be used in other
contexts such as search and retrieval services. The idea
is that the same source can be used to tag a field with a
term and to perform a lookup operation to check that the
term is part of its ontology.
getTerms
returns a collection of ontology terms. This is used by
viewers to render an ontology as a list.
getSubOntologySource
this is used to return an ontology which represents part
of a larger one. For example, consider an ontology
which is a very large taxonomy file of animal species.
The same taxonomy could be used in a number of data
forms but only a branch of the taxonomy is needed for
any given form field. The parameters of the method
could include anchor terms which help intialise the
starting point of an ontology for a given field.
getSupportedRelationships
An ontology could support multiple ways of relating
terms to one another. This method returns the list of
relationship types used by the ontology.
getRelatedTerms
given an ontology term and the name of a supported
relationship type, this method returns a collection of
related terms.
getOntologyServiceMetaData this is basic meta data information about the service that
includes:
 name
 author
 description
 version
 what formalisms are supported
 a contact email
 a unique code that identifies the software agent
This method is principally called by OntologyService.
By default the service calls the same method in its
viewer. If the viewer isn’t present or if the viewer defers
answering the request to its source, then this method is
called.
TreeOntologySource extends the OntologySource interface with a simple method for
getting the root of a tree of ontology terms. Most of Pedro’s default ontology sources rely on
ontologies that can be represented as trees.
Figure 5-8: default ontology sources supported in Pedro.
5.5.2.5 Ontology Viewer
OntologyViewer replicates many methods of OntologySource because the viewer may
elect to delegate to its source for certain method calls. There are two distinguishing methods of the
viewer interface (Figure 4-9). “getSelectedOntologyTerms” returns the collection of terms
the end-user has selected in the viewer display. “setOntologyTermSelectionListener”
notifies a component when the users have indicated in the viewer they want to use selected terms
for marking up a form field.
Figure 5-9: the ontology viewer interface.
5.5.2.6 Default Viewer’s Use of Introspection on Ontology Sources
In most cases, data modellers will create ontology services that use the tool’s default ontology
viewer. Although the OntologySource API provides access to terms, the source provides little
information about how to render terms. Its “getTerms(...)” method allows all ontologies to be
rendered as lists. However, to find out more rendering hints, the DefaultOntologyViewer applies
Java reflection to the class which implements OntologySource. The viewer tries to determine what
other ontology interfaces the class might implement. In figure 4-10, the viewer is interrogates
MyOntologySource, a class that implements the OntologySource interface.
Figure 5-10: marker interfaces used by ontology sources to provide rendering tips. Pedro’s
Default Ontology Viewer introspects ontology sources to determine what other interfaces the
ontology source classes support.
MyOntologySource
also implements TreeOntologySource, which means the viewer can present
the ontology as either a list or as a tree. DictionaryDescriptionSupport is an interface which
indicates that most ontology terms will have both a label and a text definition. This assumption
allows the viewer to produce a “Dictionary View”, a table with fields for term and definition.
URLDescriptionSupport indicates that most ontology terms will be associated with a web page.
This assumption causes the viewer to render an html pane to show the web page. The
ImageDescriptionSupport carries an assumption that most terms will be associated with an
image. This allows the viewer to produce a view of thumbnail images. OntologyCaching
indicates the ontology source can be updated. The interface has two methods: one to determine
whether the source is outdated and another method which causes the source to update itself. The
default viewer responds to the presence of the OntologyCaching interface by rendering an
“Update” button if the ontology is out of date.
5.5.2.7 The OntologyContext Object
Ontology services can ask Pedro questions about what else is on the current form. The
OntologyContext object retains knowledge about the currently selected field, the field which
invoked an ontology service, the parent record of the current form record and the field values that
appear on the form.
An ontology source or ontology viewer can access this object through the following call:
OntologyContext ontologyContext
= (OntologyContext)
pedroFormContext.getProperty(PedroFormContext.ONTOLOGY_CONTEXT);
Developers can use the OntologyContext object to help reduce the amount of terms that are
presented to the user. An example of this is provided in the “ontology” model example that comes
in the distribution bundle’s ./other_models directory.
5.5.2.8 A Walkthrough for Selecting an Ontology Term
The process of marking up a form field with a term begins with the end-user right clicking over a
starred form field label. In the desktop application, the label will belong to
pedro.desktopDeployment.TextFieldView. That label will be associated with the
TextFieldView’s instance of OntologyServiceManager. This class manages the task of presenting a
right click menu and inserting terms into the text component of the form field.
The OntologyServiceManager listens to the right click action on the form label and generates a
popup menu. It will add a menu item for each ontology service registered for the field. If the form
field does not allow free-text entry, then a “Clear” menu button is added to the popup menu.
In createMenuItemForService(...), Pedro works out what to do in cases where a service has only
an OntologySource, only an OntologyViewer or both. When an ontology has less than 20 menu
items, the OntologyServiceManager tries to render terms as menu items. When there there are
between 21 and 40 terms, the manager object tries to render submenus. For larger ontologies, it
simply makes a menu button that can cause the ontology viewer to pop up.
When the user chooses a service from the menu, the OntologyServiceManager creates a
DefaultOntologyTermSelectionListener to listen for when the end-user uses terms selected in
the ontology viewer.
The OntologyViewer allows the end-user to select terms in some kind of display. The viewer will
have some button which will indicate that the user wants to insert selected terms into the form field.
This is when the OntologyViewer notifies the DefaultOntologyTermSelectionListener. The
DefaultOntologyTermSelectionListener asks the OntologyViewer to supply it with the
selected terms. It then asks OntologyViewer to provide an OntologyTermProvenance object for
each selected OntologyTerm. The OntologyTermProvenance objects are added to the
OntologyContext object which keeps track of what terms have been used in the current form. The
OntologyContext in turn submits the OntologyTermProvenance objects to the
OntologyTermProvenanceManager, which retains knowledge about all ontology terms used to tag
the data set. The DefaultOntologyTermSelectionListener also inserts the label for each
selected term into the form text field.
When the end-user saves the data set to a native format *.pdz file, the OntologyTermProvenance
objects held in the OntologyTermProvenanceManager are written to the *.meta file. This is how
Pedro stores information about the ontology terms used to mark-up form fields.
5.5.3 Design History
The first services in POSF were simple pull down menus which presented terms to biologists. They
would select a term and it would appear in the appropriate text field. The choices came from
enumeration types which were described in the schema. As the word lists for some fields grew
larger and larger, a new way of providing controlled vocabulary terms to users was needed.
5.5.3.1 Decoupling Controlled Vocabularies from Data Models
The expressivity of an XML Schema’s enumeration types was too limited for listing dozens or
hundreds of terms, some of which could be related hierarchically to one another. This provided the
first compelling reason to make the design for Pedro decouple the schema from mark-up services.
The second reason was that large community ontologies such as KEGG, MGED, GO and others
evolved independently of one another and independently of whatever data entry model was
associated with it. By necessity, the first Architecture Decision became:
POSF Decision 1: The data entry schema and the mark-up services will evolve
autonomously at different rates. Therefore, decouple these things and support them through
separate mechanisms.
This meant that Pedro would not be driven by a single monolithic data model which described form
concepts as well as the values used to populate the fields. The design had to assume that the data
entry schema could be developed before, during or after the development of corresponding mark-up
services. The order of development appears to greatly influence what concepts are included or
excluded in either kind of model.
The scope of terms provided by a single service may not provide an appropriate match for the
meaning of a form field. The generality of some form fields could warrant supporting mark-up
from more than one service. Conversely, a large existing ontology could be used to populate
multiple form fields. Therefore, there is an M:N relationship in mark-up services: form fields. This
lead to the next decision:
POSF Decision 2: The framework should be able to associate multiple mark-up services
with the same form field.
5.5.3.2 Support for Stub Ontologies for Rapid Prototyping
Pedro was soon redesigned so that form fields could be linked with text files which contained
simple term lists. This feature proved popular with end-users during the rapid prototyping phase of
their use cases.
By this time, Pedro was being used to rapidly elicit requirements to support data entry activities.
Changes made to the data model could be instantly reflected in changes to the forms. This feature
allowed end-users to provide feedback on the model by trying to fill in the data entry forms with
real data. This allowed non-technical biologists to participate in the modelling process.
As part of that process, users would comment on sample key words that were used to fill in a form
field. Initially choices for a field might be encoded as enumeration types in the schema. As users
suggested more possible terms, the enumerations were removed from the schema and the terms
began to be managed in small text files. After awhile, users wanted to see their words as a
hierarchy. Pedro’s design was altered to allow a data modeller to choose whether a text form field
was associated with a single column text file or a tab indented text file.
The important lesson learned from the first attempt at creating ontology services was that there was
great value in having the system support simple ontologies which could be evolved through a text
editor. Their feedback on the CV could help inform ontology designers how to best manage these
bespoke collections of terms in a more sophisticated network of concepts. Architecture Decision 3
became:
POSF Decision 3: The framework should support simple stub ontologies that can be used
during rapid prototyping activities.
5.5.3.3 Basing the Framework on Identifiers Instead of Word Phrases
I thought using inserted phrases was sufficient for marking up form fields but then I began to learn
more about using ontologies. The mark-up facility was limited to ensuring that values appearing in
the form fields were spelled correctly. Although this ensured that lexical searches would encounter
fewer typographical mistakes, the services did not attempt to capture the meaning of the terms. For
example, a service could insert the phrase “testosterone” into a field but not record whether it was
being regarded as a steroid or a hormone.
The ontologists with whom I consulted suggested that the services base functionality on ontology
identifiers, not specific word phrases. In the example, an identifier such as
http://www.medicalontology.org/version1/1056 could uniquely identify the steroid and
http://www.medicalontology.org/version/4567 could uniquely identify the hormone by the same
name.
If data sets were tagged with ontology identifiers, then ontology services could apply sophisticated
reasoning to provide more concise search results or results that were tagged with related terms. The
reliance on identifiers also allowed Pedro to support controlled vocabularies in multiple languages
such as English, Spanish and French. Each language would have different word phrases for the
same concept, but the concept could be uniquely identified by software agents that supported
semantic searches. The potential benefits of basing services on ontology identifiers led to the first
requirement for a redesigned ontology framework:
POSF Decision 4: Base ontology services on ontology identifiers, not word phrases. Each
ontology identifier will be associated with a word phrase, and optionally a definition, a URL
that may describe a help web page, or an image.
5.5.3.4 Supporting Multiple Formalisms
By the first part of 2003, more sophisticated ontology technologies were maturing. DAML+OIL
became popular and later this language influenced the development of its successor language OWL.
These languages allowed ontology designers a more sophisticated means of organising and relating
concepts. Along with the languages came software tools that allowed ontologists to relate domain
concepts.
The next major Architecture Decision for POSF was to determine whether it was better to support
ontologies that used one technology or ontologies that used multiple technologies. Single column
and tab-indented text files were proving invaluable for rapid-prototyping efforts so it seemed clear
the system should support these data formats for storing ontology terms. However, they seemed
limited in their ability to support more sophisticated ways of finding the right terms to use for a
form field.
The emerging ontology technologies promised more powerful mark-up services that employed
automated reasoning to derive new relationships amongst terms. Ontology reasoners could apply
constraints to help limit the number and kind of mark-up terms that could be presented to an enduser. The feature could lend an air of artificial intelligence to Pedro which would allow the tool to
guide users through data entry.
The spectrum of possible ontology formalisms ranged from simple tab indented lists to very
sophisticated OWL ontology files. It seemed that if only simple term lists were supported, then data
sets would only ever be retrieved as a result of lexical searches. There seemed to be great
advantages for using the new ontology technologies. However, I had a number of concerns
committing Pedro’s design to them.
First, in 2003 and 2004, the DAML+OIL and OWL ontology technologies seemed to be in
transition. If core code in Pedro were exposed to aspects of these formalisms, then program
maintenance would become dependent on changes that were made to the other technologies.
Second, the power promised by ontologies seemed to be matched by the skill level required to
create and maintain them. The development of advanced ontologies seemed to involve a detailed
knowledge of areas of knowledge such as description logics. During that time, ontology research
seemed to be promoted at a few key campuses across the world. The development of ontologies
seemed to be centred in a few bioinformatics groups.
I concluded that knowing how to build advanced ontologies was a craft that would lie outside the
domain of expertise for staff on a typical bioinformatics project.
Many labs would therefore either have to invest resources training their own people about
ontologies or outsource expertise on this topic to a few research centres. Manchester happens to be
one such centre of excellence on ontologies and they made themselves available for me to ask
questions of them. However, I felt that the tool would enjoy a broader uptake in the community if
the software minimised its dependencies on institutional products and specialised technologies.
Different groups would want to choose which formalism they wanted. The choice of technology
could reflect legacy needs, biases towards emerging technologies or different levels of effort spent
learning how to use one formalism or another. There also seemed to be a need to support services
that were suited for rapid prototyping or for production purposes. These observations led to the
next Architecture Decision:
POSF Decision 5: Support multiple formalisms. Do not limit support either for very simple
or very sophisticated ontologies.
Supporting multiple formalisms necessitated the development of interfaces which would allow
Pedro to interact with multiple mark-up services in a uniform way. The interfaces would shield the
main application from details about how ontology terms were managed or related.
The benefit of this approach is that data modellers can substitute ontology services without affecting
the design of the data entry schema or the rest of the program that renders it as forms. Using an
adapter design pattern, Pedro could interact with services which relied on something as simple as a
single column text file or as sophisticated as an OWL file.
5.5.3.4 Decoupling Aspects of Model and View in an Ontology Service
It was apparent from the early stages of the Pedro project that the people who maintained ontologies
had different needs than the people who used them. Ontologists use a variety of software tools to
build their ontologies. They may edit text files with WordPad, make acyclic graphs of terms using
DagEdit, or create other ontologies using tools such as OWLEditor. Each tool may store terms in a
different file format. A generic interface for mark-up services had to account for the different
sources which can provide terms. The interface would be designed for the benefit of people
maintaining the ontologies.
Visualising those terms is a separate concern. Ontology terms can be presented to users in a
number of ways including a list, a tree, a table or some other form of graphical display. The
interface would also account for the different ways of rendering terms and be designed for the
benefit of people using the ontologies.
This lead to the next architecture decision:
POSF Decision 6: Let each ontology service comprise one or both an OntologySource and
an OntologyViewer. Each of these objects is described by an interface. An
OntologySource provides terms and is designed on behalf of those who maintain ontologies.
An OntologyViewer renders terms provided by the OntologySource, and is designed on
behalf of those who use ontologies. The ontology service may be configured to mix and
match an OntologySource with an OntologyViewer.
5.5.3.5 Consider Local and Remote Ontology Sources
Some community groups such as MGED post the latest version of their ontology on a web site. In
other use cases, ontologies are locally maintained word lists. To support both cases, the services
would have to consider collections of terms that are maintained locally or remotely. The next
decision became:
POSF Decision 7: make the design of an OntologySource consider whether terms are
maintained locally or remotely.
5.5.3.6 Accommodate Updating in Ontology Sources
Following on from the previous design decision, it is reasonable to expect that an ontology source
could become outdated. The framework needed some mechanism for asking an OntologySource
whether it needed to be updated. In some cases, updating could be done automatically to allow
Pedro to present users with the latest terms. In other cases, a local ontology could contain locally
evolved terminology or it could present the version of an ontology that a laboratory was most
confident in using. Here, automatic updates should not be done but be left to the discretion of the
end-users. This lead to using:
POSF Decision 8: the framework should provide some way of determining whether an
OntologySource needs to be updated. End-users should be able to decide whether the
ontology service updates itself.
5.5.3.7 Provide Meta Data about Ontology Services
As ontologies evolve, it is important to keep track of what versions were used to tag data sets.
Therefore:
POSF Decision 9: require ontology services to provide meta data information about the
ontologies. This information should include the name, author, version, description and kind
of formalism supported by an ontology.
A couple of years ago, I inquired about what standard there was for describing aspects of an
ontology service. From my investigation it seemed like the ontology community didn’t have a clear
idea of what kind of meta-data should be gathered about a term or a service. This led me to guess
what kinds of attributes should be recorded for each term.
Because ontology identifiers aren’t meaningful to end-users, there wasn’t a reason to include them
when terms were inserted into form fields. However, the information had to be maintained
somehow.
5.5.4 Scope of Effect
The Pedro Ontology Services Framework is used in all of the Pedro tools created thus far including
the desktop Pedro application, the Tablet Pedro application, the Pedro Configuration Tool, the
Pedro Meta Data Editor, the Pierre Configuration Tool and most of the search and retrieval
applications generated by Pierre.
Within the Pedro code base, POSF features are referenced in
pedro.desktopDeployment.TextFieldView and pedro.tabletDeployment.TextFieldView.
5.5.5 Relevant Code Packages
The packages relevant to POSF include:



pedro.soa.ontology.sources
pedro.soa.ontology.views
pedro.soa.ontology.provenance
The schema for ontology services is defined in the
./models/pedro_form_configuration/model/pedro_form_configuration.xsd file. The meta data
retained for each ontology term is defined in
./models/pedro_meta_data/model/PedroMetaData.xsd file. Examples of implementations of
ontology services can be found in the “ontology” model folder that comes with the download. You
can find it under the “./other_models” folder”
5.6 ID Generator Services
5.6.1 Purpose
to provide an identifier that can be used to populate an identifier field.
5.6.2 Description
The IDGeneratorService interface is described in Figure 4-11. The interface extends
ServiceClass in the same way that OntologySource, OntologyViewer,
DocumentValidationService, RecordValidationService and other service classes do.
Figure 5-11: the IDGeneratorService
IDGeneratorService has two main methods. The generateKey(...) method is used to generate
a String value that Pedro inserts into the attribute form field. excludeKey(...) is used when a
data file is read. ID values found in existing records are excluded so that the generateKey(...)
doesn’t produce a key which already exists in the data set.
5.6.3 Design History
Many of the use cases in bioinformatics use identifiers to uniquely label experiment records. For
example, if a data set describes a number of samples, they would each get a unique identifier so
they could be processed better by analysis programs. Unlike ontology services, identifiers don’t
have a semantic value. However, identifiers may have naming conventions which use domainspecific phrasing. For example, a unique identifier for a sample might include the name of the
laboratory where the work was done. Pedro needed some kind of service which could generate
unique keys. This was the incentive for developing the IDGeneratorService interface.
5.6.4 Scope of Effect
ID Generator services are only ever used for attribute fields.
5.6.5 Relevant Code Packages
IDGeneratorService is defined in soa.id.IDGeneratorService. It will appear in
pedro.desktopDeployment.IDFieldView. The class name for an IDGeneratorService
stored in pedro.mda.config.AttributeFieldConfiguration.
will be
5.7 Plugins
5.7.1 Purpose
to allow developers to extend the functionality of the data entry with code modules that perform
domain-specific tasks.
5.7.2 Description
Pedro supports plugins that can have a scope of effect for the current field, the current record or the
current document. To make plugins, developers must create a Java class which implements
pedro.soa.plugins.PedroPlugin. This interface is shown in Figure 4-1:
Figure 5-12: a Java class implementing the PedroPlugin interface.
getDisplayName()
getDescription()
provides the name used to advertise the plugin in a menu, button or list.
returns a description of what the plugin does. isWorking() is a diagnostic
method used to help Pedro determine whether a plugin should be included as a service that can be
used by the end-users. isSuitableForRecordModel(...) is used to help limit the use of the
plugin. It takes as arguments a model stamp that describes the version of the schema and a record
class name, which indicates the record type of the currently displayed record. Both parameter
values are supplied automatically by the system. Plugin developers can use the information to
determine whether it is appropriate for the tool to register the plugin to suit the current data entry
task.
Plugin developers can make their plugin classes implement other marker interfaces such as
AnalysisPlugin, DataExportPlugin and DataImportPlugin. Pedro uses the information to
produce a summary of different available plugins; the results are written to the status bar of a Pedro
dialog.
To ensure that their plugins are detected by the tool, developers must follow these three steps:
1. produce a JAR file containing the plugin classes
2. rename the extension of the file from *.jar to *.plugins.
3. move the jar file in the “lib” directory of the model folder.
Plugins are associated with different parts of the application via the Pedro Configuration Tool.
When the data entry tool is running, plugins may appear in different places. Document-level
plugins will appear in one or more of the menus in the menu bar. If plugins are associated with a
given record type, then a “Plugins...” button will appear flushed top-right in the main form
whenever end-users are editing an instance of that kind of record. If plugins are associated with a
field, the same button will appear at the end of the form field.
5.7.3 Design History
There are two significant differences between plugin systems developed for Pedro v1.9 and Pedro
v2.0:
Pedro 2.0 plugins can have a field, record or document-level scope of effect. Previous releases
only supported document and record-level plugins. Document-level plugins would only appear in
Pedro’s File Menu, and would have to implement a special “RecordImporter” interface.
The new plugins also have a different execution method. Pedro 1.9 plugins used a process(...)
method whose parameters were appropriate for the desktop deployment but not the tablet
deployment of the data entry tool. The method was renamed “execute(...)” and passed a single
pedroFormContext parameter which could hold as many other parameter values as developers
wanted. To obtain values for the parameter values passed in process(...), follow this code
example:
RecordModelFactory recordModelFactory
= (RecordModelFactory)
pedroFormContext.getApplicationProperty(PedroApplicationContext.RECORD_MODEL_FAC
TORY);
NavigationTree navigationTree
= (NavigationTree)
pedroFormContext.getDocumentProperty(PedroDocumentContext.NAVIGATION_TREE);
RecordModel currentRecordModel
= (RecordModel)
pedroFormContext.getProperty(PedroFormContext.CURRENT_RECORD_MODEL);
5.7.4 Scope of Effect
Plugins are associated with menus, records and fields via the Pedro Configuration Tool. Providing
that plugins implement the PedroPlugin interface, changes in customised services shouldn’t effect
the rest of the code base.
5.7.5 Relevant Code Packages
The Pedro plugin classes are defined in the pedro.soa.plugins.* package. Examples of Pedro
plugins can be found in the pedro.configurationTool.* package, which features plugins used in
the Pedro Configuration Tool.
5.8 Configuration System
5.8.1 Purpose
to manage options for configuring a data entry application that are not expressed in the XML
schema.
5.8.2 Description
Pedro interprets the XML Schema to determine properties of the data entry application. However,
many of the configuration options can’t be expressed in the XML Schema language, so they are
managed in a ./[model]/config/ConfigurationFile.xml. This file maintains a collection of
mappings that link schema concepts to different kinds of properties. The Pedro Configuration
System manages this file and provides configuration options to the rest of the system. It has three
main aspects:
 Pedro Configuration Tool
 ConfigurationReader class
 data structures used to manage configuration data
5.8.2.1 Pedro Configuration Tool
The Pedro Configuration Tool is an instance of the Pedro tool which has been configured with
plugins that suit configuring data entry applications. The tool has its own separate tutorial;
discussion here will be limited to describing the code that provides the functionality. Most of the
classes relevant to this discussion appear in the pedro.configurationTool.* package. The main
class for the tool is pedro.configurationTool.PedroConfigurationTool.
The Pedro Configuration Tool runs off a schema defined in the
./models/pedro_form_configuration model folder. This model describes records such as
“RecordModel” and “EditField” which hold configuration data and correspond to Pedro’s native
data structures. For example, an EditField record has a field called “help_link”, which specifies a
URL for a page that is displayed for context-sensitive help. The “EditField” record corresponds to
Pedro’s pedro.mda.model.EditFieldModel data structure.
For some of the configuration records, data modellers have to provide the name of a record or field
in a target schema. For example, the previously described “EditField” record may have
“patient_name” for the value of the field name. This is an example of a reference which is used to
link the edit field “patient_name” found in the target schema with configuration properties such as
“help_link” described in the ConfigurationFile.xml file.
To help reduce data entry errors, the Pedro Configuration Tool has a number of ontology services
and plugins which help automatically populate configuration records with this linking information.
However, in order to support this activity they need knowledge of the records and fields that appear
in the target schema.
To obtain this information, PedroConfigurationTool prompts the end-users to select a target
model folder. This folder is expected to contain an XML Schema. However, it is not expected to
contain an existing configuration file because this is what the Configuration Tool is supposed to
produce.
The tool reads the model folder containing the target schema and holds information about it in a
special Pedro context object. This object is then stored with the key
“TARGET_SCHEMA_APPLICATION_CONTEXT” within the configuration tool’s own
PedroApplicationContext variable.
The tool’s plugins use information held in this object to help fill in the linking information expected
in the configuration records. The plugin
pedro.configurationTool.DefaultConfigurationFileCreationPlugin provides a good
example of how information in the target schema application context is used to fill in configuration
records.
5.8.2.2 ConfigurationReader
Data modellers use the Pedro Configuration Tool to produce the file “ConfigurationFile.xml”,
which describes all the application properties that are associated with XML Schema concepts.
When the Pedro application starts, it reads the configuration file and holds the information in
instances of data container classes. These classes have properties which are analagous to classes
defined in the XML Schema for the Pedro Configuration Tool (See Appendix A).
Various parts of Pedro use the ConfigurationReader to obtain configuration data that are used to
render the application. For example, pedro.desktopDeployment.TextFieldView uses the
ConfigurationReader to determine whether it should link an edit field defined in the domain schema
to ontology services. In another case, Pedro’s menu classes use the ConfigurationReader to
determine which standard menu items should be included for display.
5.8.2.3 Other Configuration Files
Pedro maintains two other configuration files in the config directory of a model folder:


SessionAspects.xml
FileExtensionsToLaunch.xml
The SessionAspects.xml file contains information about the most recently accessed files, and is
managed by the class pedro.mda.config.SessionManager. Pedro uses a list of recently accessed
files to create the “Favourites” sub-menu located in the File Menu. It also uses session information
to set the default starting directory for when users open files.
FileExtensionsToLaunch.xml
associates file extensions with shell commands used to launch
other software applications. The mappings are used when users press the “View” button on a URL
Field View. If the file specified in the text field ends with an recognised extension, Pedro will try to
launch an application by making a system call.
It remains the only configuration file that end-users still have to configure. Currently, the
SessionManager uses the class FileLauncher to parse the file of mappings. In future, this file will
be eliminated in favour of having the same information represented as properties in the Pedro
Configuration Tool.
5.8.3 Design History
Prior to Pedro 2.0, data modellers had to craft the ConfigurationFile.xml file by hand. The file was
awkward and time-consuming to maintain, especially when large XML schemas were being used.
This led to the idea of using Pedro to edit its own configuration files. The Pedro Configuration
Tool was developed using a schema of configuration properties and a collection of plugins which
helped data modellers fill in the forms.
The advent of the Pedro Configuration Tool meant that a configuration file could be designed far
more rapidly than it could in previous releases. Enshrining configuration options in an XML
Schema also meant it was easy to add new properties.
The configuration options for Pedro expanded to include the selective inclusion of menu items and
services which could be associated with a field, record or document scope of effect. With new
features came data container classes that held the property values in memory.
The expansion resulted in a configuration file that can accommodate 59 more options. The only
drawback for enhancing the configuration system is that configuration files developed in Pedro v1.9
are not compatible with those made in Pedro v2.0.
The configuration files produced by the Pedro Configuration Tool and the Pierre Configuration tool
share many of the same properties that are associated with concepts in a target schema. To allow
Pierre to re-use Pedro code, the configuration reader classes for each tool were made to implement a
pedro.mda.config.SchemaConceptManager interface. Some of the calls to
ConfigurationReader that once appeared in the Pedro code base have been replaced by calls to
SchemaConceptManager.
5.8.4 Scope of Effect
The Configuration Reader is referenced in the file menu classes to determine what features should
be included. Many of the classes which generate form fields interact with the ConfigurationReader
via the SchemaConceptManager interface.
5.8.5 Relevant Code Packages
The code for the PedroConfigurationTool can be found in the pedro.configurationTool.*
package. Code for ConfigurationReader and the data container classes can be found in the
package pedro.mda.config.*.
5.9 IO
5.9.1 Purpose
to store and retrieve form data managed by Pedro
5.9.2 Description
Pedro normally stores a data set as a zipped file ending in a *.PDZ file extension. The zipped file
contains a number of XML files, each of which represents a layer of information. Currently there
are two layers: the data layer and the meta-data layer. The data layer is represented by the .PDR file
and contains the text that would appear in form fields. The tags found in the data layer will be
defined in the target schema used to drive the data entry application.
The meta-data layer is represented by the .META file and contains meta data about the data set,
including basic information about the author and about all the ontology terms which were used to
mark-up form fields. The tags found in the meta-data layer are defined in the schema:
./models/pedro_meta_data/model/PedroMetaData.xsd.
The IO system for creating PDZ files can be extended to include other information layers (see
Design for Extensibility section).
Pedro can export a data set as an XML file which will only contain information from the data layer.
This export feature appears in the “Export to Final Submission Format” menu option but will
probably be relabelled something more appropriate in the future.
5.9.3 Design History
5.9.3.1 Use of Layers
Originally, Pedro stored a data set as a single XML file. The need to store a data set as a collection
of layers arose with the development of ontology services. Initially, ontology services provided text
phrases that would be pasted into forms. However, an ontology term is not adequately represented
by a word phrase. Ontology terms were eventually redesigned to use a human-readable label and a
machine-readable identifier (see Section 4.5).
Although the labels for selected ontology terms were stored into form fields, Pedro needed some
mechanism for storing information about the unique identifiers. Initially, ontology terms were
written as hyperlinks in the XML data file. They were stored in the form:
<a href=”[unique_identifier]”>label</a>
The problem with this approach was that data sets marked up with ontology terms would fail to
validate against the XML schema. This was because the schema would not describe the “<a>” tag
which appeared within the tags for a form field. Rather than treating “<a>” as a tag with special
significance, I decided to store ontology term identifiers in a separate meta data file that would
accompany the data file. Pedro was modified so that its data sets were stored in ZIP files that
contained multiple information layers.
Figure 5-13 illustrates the structure of a native format *.pdz file.
Figure 5-13: the structure of Pedro’s native format *.PDZ file. It contains a *.pdr file which holds
the form data and a *.meta file which holds the meta data about the data set.
5.9.3.2 Changing Parsers
Pedro used to rely entirely on the DOM parser and still uses it for parsing the meta data file. The
DOM parser works by parsing an XML file and producing an in-memory tree of DOM model
objects. The API for DOM objects made it easy to extract information from the XML file. The
parser performs well with small data sets but exhibited performance problems when it was used to
process large data sets. This is because the parser loaded an entire XML file into memory before
the DOM objects could be used. The application experienced great performance gains in reading
files when some of the I/O classes began using the SAX parser.
5.9.3.3 Support for Streams
Pedro IO files were modified so they could accept data streams instead of just files. This was done
to make it easier to deploy Pedro as a component rather than as a standalone application. In a
component mode of activity, Pedro may receive its data input as a stream coming directly from
another component.
5.9.3.4 Creating the “Export to Final Submission” Feature
Pedro used to allow end-users to export native format *.pdz files to *.xml files that only contained
the data layer of information. The *.xml files tended to be candidate files for submission to data
repositories. I thought it was a good idea to rename this format to “Export to Final Submission
Format” and cause the menu feature to validate the document. If there were any errors, Pedro
would not create the *.xml file. This action ensured that end-users fixed all the errors before they
sent their files off to repository managers.
5.9.3.5 Providing Support for the Meta Data Layer
For most of its development cycle, Pedro has saved an arbitrary collection of meta data in the
*.meta layer. Typically this focused on recording which ontology terms were used to tag a
particular kind of schema concept such as a form field or record.
Whenever a new attribute was added, it resulted in changes made to special I/O routines which read
and wrote meta data records. In 2007, the *.meta layer was given its own distinct schema for meta
data. Each Pedro tool now loads the pedro_meta_data model and uses a special context variable
(See Section 4.3) to help read and write meta data records. These records are maintained
independently of the form data end-user edit through the normal use of the tool.
A new utility has been designed which will allow data curators to edit just the meta data layer of a
given *.pdz file. The Pedro Meta Data Editor uses the same pedro_meta_data model but allows
curators to post-annotate a *.pdz file. Curators can now remove ontology terms that were used to
tag records and fields. Alternatively, they can add more terms using the same ontology services that
are available to end-users.
With the new support for the *.meta layer, data curators can change the meta data about a data set
without editing the data themselves. The layers can be maintained completely independent of one
another.
5.9.3.6 Merging dataImport and IO Class Packages
Pedro v1.9 had separate packages for classes that managed Pedro data files and those that were used
to import to or export data from spreadsheets. Now all of the classes in the pedro.dataImport.*
package have been moved into the pedro.io.* package.
5.9.4 Scope of Effect
Most of the I/O classes are defined in pedro.io.* package. Whereas the meta data used to be
managed by pedro.io.MetaDataReader and pedro.io.MetaDataWriter classes, meta data
records are now written using the normal PedroDataFileReader and PedroDataFileWriter classes
respectively.
Most of the IO packages are called in the pedro.desktopDeployment.FileMenu or
pedro.tabletDeployment.FileMenu classes.
5.9.5 Relevant Code Packages
The IO classes appear in pedro.io.*. PedroDataFileReader/Writer are used to manage the .PDR
files that represent the data layer of each data set. NativeDataFileReader/Writer uses these classes
when it manages the zipped .PDZ files. XMLSubmissionFileReader/Writer wraps
PedroDataFileReader/Writer and produces .XML files.
5.10 Alerts
5.10.1 Purpose
to provide a means of allowing end-users to compile their own lists of errors, warnings and tips that
can be used by other researchers in the community when they validate their data sets. It is intended
as an extensible way to enhance validation facilities of the tool and a passive way that researchers
can communicate with each other.
5.10.2 Description
Pedro supports an Alerts System to supplement the tool’s standard validation facilities and to allow
end-users a way of using advice provided by other end-users. An alert is a set of matching criteria
which identifies patterns of field value combinations in a record. The matching criteria can be
associated with one of four intents:
 error
 warning
 information bulletin
 request for the user to contact someone else.
Domain experts can use the Pedro Alerts Editor to create a collection of alerts called an Alert
Bundle. The bundle is a ZIP file that contains a small XML file to represent each alert. The alert
bundles can be included as part of the release of a new model folder, or they can be hosted at some
URL.
Other end-users can import these alert bundles and use them in two situations:
 they attempt to export the document to a final submission format.
 user uses the “Show Errors” feature in the View Menu
When either of these actions are taken, the tool scans the current document and identifies any
records which match an alert. The results are included with any errors the system identifies in the
document. Should any of the alerts represent errors, then the task of exporting data to a final format
will fail.
5.10.3 Design History
The Alert system was originally developed because Pedro had no way of identifying errors which
were due to combinations of field values found within the same record. Individual field values
could be validated but they could still present errors when they were considered in conjunction with
other field values on the same form. For example, a patient record form could have fields “gender”
and “cancer type”. “male” and “ovarian cancer” could represent legal values for their respective
fields but represent an illegal combination of values. The system was generalised to include
warnings and other kinds of messages that might prove useful in an activity of standardised data
entry.
Now that Pedro supports field validation services and general plugins at the field, record and
document levels of data entry activity, it is unclear whether this system will become more popular.
The Pedro Alerts system is limited in that it will only identify patterns of values found within the
same record. However, the system allows end-users to enhance the tool’s validation capabilities
without having to code plugins. This benefit may prove important in settings where there is a
scarce availability of software developers to make validation plugins.
Pedro’s validation package has been modified so that validation routines return a collection of alerts
rather than a String that could contain an error message. This was done to allow validation plugins
to return results that reflected different kinds of data quality.
5.10.4 Scope of Effect
The Alerts package is becoming more intertwined with the Validation package. Validation services
described in the pedro.soa.validation.* package are required to return a collection of Alerts when
they validate a field, record or document. Many of the error messages thrown by parts of the
system are encapsulated in a SystemErrorAlert(..) object, which is an instance of an Alert.
5.10.5 Relevant Code Packages
Most of the classes used to support alerts functionality are found in the pedro.soa.alerts package.
The Alerts Editor can be run by invoking pedro.desktopDeployment.PedroAlerts
pedro.soa.alerts.*
5.11 Meta Data System
5.11.1 Purpose
to capture and isolate meta data about a document. The meta data forms a summary view that can
be used for simple search and retrieval operations supported by data dissemination systems.
Pedro was designed to accommodate large data files in bioinformatics. A layer of meta data was
added to the native format *.pdz files so that data dissemination systems could interpret a small
meta data file before having to find search criteria in the much larger data layer.
5.11.2 Description
Pedro has an internal system for maintaining meta data about documents. The kinds of meta data
that are managed include:
 summary information including the title, author, e-mail, institution and description of the
data set
 the number of each kind of record that appears in the data set
 provenance data about all the ontology terms that are used to mark-up form fields
Some of this information is provided by the end-users. Figure 4-14 shows the dialog that appears
when they select the “Describe this document” feature from the Options menu. The title, author, email, institution and description values are saved as part of the meta data for the document.
Figure 5-14: the meta data dialog that appears when end-users select “Describe this document”
option in the Options menu of a Pedro dialog. When end-users use the dialog to create a summary
of their data set, the information is stored in the *.meta layer.
The remaining meta data are captured automatically. Pedro monitors how many instances of each
kind of record appear in the document. When end-users use ontology services to mark-up form
fields, the tool asks the services to provide provenance data about all the selected terms. These data
are also stored as meta data.
Meta data are stored as an information layer within the *.pdz native file format (see Section 4.9).
The layer is expressed as an XML data file that is defined by a schema. Using the Pedro Meta Data
Editor, data curators can edit the meta data file independently of the form data. They can edit
summary information or ontology terms to suit changes in the way documents are classified in a
data repository. The following sections describe aspects of the meta data system in more detail.
5.11.2.1 Walkthrough for Capturing Ontology Term Meta Data
The process of committing meta data about ontology terms to file begins when an end-user right
clicks on the label of a form field that supports ontology services. This walkthrough traces the
activity in the desktop deployment of the tool.
If an instance of pedro.desktopDeployment.TextFieldView has been linked with ontology
services, the object associates its starred form label with
pedro.soa.ontology.views.OntologyServiceManager. This class is responsible for presenting
the available ontology services to end-users and ensuring the selected terms appear in the text field.
OntologyServiceManager delegates the task of listening to right-click mouse actions to a
ServiceMenuListener class. When the ServiceMenuListener detects a right-click over the form
label, it causes the OntologyServiceManager to generate a popup menu of available services. If a
service has less than 40 terms, it attempts to render them as menu items. Otherwise, it displays a
“Select terms...” buttons which causes an OntologyViewer to display the terms.
When an ontology service is selected, the OntologyViewer is associated with an
OntologyTermSelectionListener. When end-users have selected terms for mark-up, the viewer
notifies the OntologyTermSelectionListener. The listener is then supposed to ask the viewer to
return meta data about each term that has been selected. OntologyViewer obliges by returning a
collection of OntologyTermProvenance objects.
Pedro makes use of a DefaultOntologyTermListener which performs the mark-up action. It adds
the OntologyTermProvenance objects to OntologyContext, which maintains information about
the content of fields that are currently displayed. OntologyContext in turn adds the provenance
objects to OntologyTermProvenanceManager. This object maintains information about all the
ontology terms that have been used to mark-up the whole document.
The OntologyTermProvenanceManager is owned by a DocumentMetaData object, which holds
information about meta data for the whole document. This is the object that is used by
NativeFileFormatWriter and NativeFileFormatReader to serialise the meta data information to
a *.meta XML file.
5.11.2.2 The Pedro Meta Data Editor
The Pedro Meta Data Editor is an instance of Pedro that has been customised to edit the meta data
layer of *.pdz files. The forms for the tool are generated by the schema described in Appendix C.
The code used to make the plugins is explained more in Section 9.7.
The Meta Data Editor allows data curators to alter the meta data layer independently of the data
layer. The editor is designed to let them annotate the document with terms that come from the same
ontology services that are available to a regular end-user.
5.11.3 Design History
Section 5.9 describes much of the design history that led to developing code to support document
meta-data. Up until Pedro v1.9, the meta data were maintained automatically by the tool and there
was no simple way that a data curator could edit the file. The activity of gathering meta data about
a document would depend on how often the end-users would make use of ontology services.
It became clear that Pedro should support the needs of data curators. We observed that most endusers wanted to minimally fill in a document so they could fulfil requirements set out by journal
publications. However, data curators were concerned with ensuring that documents were
sufficiently tagged so that they could be detected in search operations applied to a data repository.
The recognition of a data curator as a new kind of user led to the development of the Pedro Meta
Data Editor.
5.11.4 Scope of Effect
Pedro’s meta data classes are used extensively by the ontology services and are used by the native
file format I/O classes to create the *.meta layer in each *.pdz file.
5.11.5 Relevant Packages
Most of the meta data classes appear in the pedro.metaData.* package. The
OntologyTermProvenanceManager used to manage meta data about ontology terms appears within
the pedro.soa.ontology.provenance.* package.
5.12 Form Generation Facilities
5.12.1 Purpose
to generate UIs for Pedro which suite displays on desktop and Tablet PCs.
5.12.2 Description
Pedro’s architecture maintains a rigid separation between the structures that hold data and structures
that present them to an end-user. The model aspects are represented by the native data structures
described in Section 5.2. The view aspects are represented by the following packages:
 pedro.desktopDeployment.*
 pedro.tabletDeployment.*
5.12.2.1 General Classes for Generating Desktop Pedro Forms
Figure XXX describes the major classes used to render forms in Desktop Pedro. When end-users
invoke the “run_pedro” script, it spawns an instance of PedroApplication. The object prompts
end-users to load a model and it produces template record definitions that can be used to populate a
document. PedroApplication creates an instance of an empty PedroDialog, which is the main
window for the data entry tool. Each PedroDialog has a PedroMenuBar containing a variety of
menus subclassed from PedroMenu. PedroMenu contains general-purpose code for handling
plugins. Each PedroDialog will also have a NavigationTree, which is the tree display that
appears on the left part of the window. The manages a tree of NavigationTreeNode objects that
mirror the tree of RecordModel objects which comprise the document. The third major UI
component of the PedroDialog is the RecordView object that displays the form fields for the
currently selected record in the NavigationTree. The RecordView comprises a RecordViewTitle
which appears flushed left at the top of the panel, and a collection of DataFieldView objects which
render form fields. Each DataFieldView object uses a corresponding DataFieldModel object that
is part of the currently selected RecordModel object.
Figure XXX: major classes used to generate forms for Desktop Pedro.
5.12.2.2 Classes for Generating Edit Fields in Desktop Pedro Forms
Figure XXX describes a more detailed view of UI classes that are used to render text fields,
identifier fields, combination box fields and radio fields. EditFieldView contains code which can
render properties of a corresponding EditFieldModel object. It is responsible for rendering a
“Plugins” button for any form field that has been associated with a collection of plugins. The other
field view classes use properties defined in corresponding edit field model objects. RadioFieldView
uses the choices provided by GroupFieldModel to render a group of radio buttons. If
GroupFieldModel provides more than three choices, Pedro uses a CombinationFieldView to render
the items as a drop-down list. URIFieldViews use TextFieldModel objects whose XML Schema
definitions used “xs:anyType” for the type attribute (See Appendix B). IDFieldView uses an
instance of IDFieldModel to render a text field that is accompanied by a “Generate Key” button.
The identifier service used to provide an identifier is a property of the IDFieldModel.
Figure XXX: Classes responsible for rendering edit fields in Desktop Pedro
5.12.2.3 Classes for Generating List Fields in Desktop Pedro Forms
Figure XXX shows classes responsible for rendering list fields. A ListFieldView comprises a
ListTypeManager, a ListValueManager and it uses an instance of ListFieldModel.
ListTypeManager
manages the type of child record that will be created or edited when end-users
press “New” or “Edit” buttons. There are implementations of ListTypeManager which
accommodate lists that support one or multiple types. MultiListTypeManager renders a
combination box populated with the kinds of records that can appear in the list field.
ListValueManager is an abstract class that manages the display of list items.
SingleListValueManager represents a one item list as a single non-editable text field that shows
the display name of the child record. MultiListValueManager shows the child records in a
scrollable list of record names.
List fields are created by the pedro.desktopDeployment.RecordViewFactory class. It
determines the type of list field to produce by inspecting the fieldViewType attribute of the
ListModel object.
Figure XXX: Classes responsible for rendering list fields in Desktop Pedro.
5.12.2.3 Classes for Generating Forms in Tablet Pedro
TabletPedro was designed to support the same data models as those used to run Desktop Pedro.
However, we envisioned that the different forms of deployment would be used at different stages of
editing the same document. End-users will tend to use TabletPedro to do simple data entry tasks
which require them to be at a work site such as a laboratory or a remote field location. They will
tend to use Desktop Pedro to support complex data entry tasks and to fill in the rest of the
document.
TabletPedro was designed with the following principles in mind:
 minimise the feature set of the application to support essential tasks
 minimise the use of pop-up dialogs
 economise on screen real estate
 change features which rely on right-menu clicks.
The classes used to generate forms in Tablet Pedro are in the pedro.tabletDeployment.*
package. Many of them share the same name as other classes that cater to supporting Desktop
Pedro. There are a few notable differences.
First, some of the menu items have been removed. For example, the File Menu does not contain an
“Export to Final Submission Format” button. This feature was deemed non-essential because it was
likely this would be done with the Desktop version.
To reduce the number of pop-up dialogs, some features were altered so they used a stack of screens
instead of separate windows. For example, consider the “Window” feature in Desktop Pedro.
When end-users switch from one file to another, a new window grabs focus. In Tablet Pedro, only
one file is shown at a time. When end-users change the current file, it changes the file loaded in the
current window.
As another example, the DefaultOntologyViewer was altered so it relied on JPanel objects
instead of JDialog objects. This was done so the ontology viewer was more easily embedded in a
stack of windows.
To economise on screen real estate, the NavigationTree was removed. It is accessible via the
“WhereAmI” button, which pushes a view of the NavigationTree onto the stack of windows.
TabletPedro supports navigation via a RecordStack object. RecordStack is a drop down list that
shows the currently edited branch of the tree.
Finally, the right click mechanism for activating ontology services is replaced by pressing a “Markup” button which appears at the end of a text field.
5.12.3 Design History
Initially, Pedro tightly coupled data objects with the objects that viewed them. This led to severe
performance problems because Java would potentially be managing thousands of UI components,
each using a collection of Swing objects. This led to stripping the native data structures of
references to UI components. Eventually the production of form field views was centralised in the
class pedro.desktopDeployment.RecordViewFactory.
The most significant change in the form generation classes came when Tablet Pedro was developed.
Initially we wanted to run Pedro on a PDA. However, these devices often require a specialised
form of the JVM which does not support Swing components. Although the native data structures
were stripped of references to UI components, they still had one crucial reference to the swing
libraries: the ChangeListener class. We realised it would not be easy to port the code base to a PDA
platform so instead we decided on a Tablet PC platform.
It became clear that we had to make some changes to the forms. The NavigationTree was taking up
too much room in the display and it was difficult to tap a pen on some of the nodes. Too many
windows popped up and they cluttered the screen area. We also observed that it was difficult to
activate ontology services. In the Tablet display, a right-click action is done by tapping and holding
the pen on a form label. This seemed too awkward to do, so a “Mark up” button was developed
instead.
The development of TabletPedro took 3 consultation meetings with mass spectrometer scientist
Jennifer Lynch and ten business days of coding. The result proved that Pedro’s design isolated its
model aspects enough to develop new ways of visualising them.
5.12.4 Scope of Effect
The UI classes are probably not used by anything other than classes in Desktop and Tablet
deployment packages.
5.12.5 Relevant Code Packages
The classes used to generate user interfaces in Pedro are in the pedro.desktopDeployment.* and
pedro.tabletDeployment.* packages.
6 Extending the Core Code Base
This section describes ways that developers could extend the core code base. The following
subsections represent tasks which may be part of future enhancements or may represent common
requests within the Pedro developer community.
6.1 Replacing the schema parser
Pedro communicates with the schema parser via the SchemaReaderInterface, which can be
implemented to rely on XML Schema or other data modelling technologies. A few years ago, we
tried to make Pedro run off a data model created by a knowledge acquisition system called PCPack.
It was able to generate forms for simple models and proved that the application could be insulated
from changes in the way models were read.
To adapt Pedro to suit another model interpreter:
1. create a Java class that implements the pedro.mda.schema.SchemaReaderInterface
interface.
2. develop code to parse the model file.
3. use the model properties extracted from the model to build up definitions of RecordModel
objects (see pedro.mda.model package and Appendix B)
4. register the RecordModel objects as templates in the RecordModelFactory object. These
templates are used to instantiate records created in the program.
5. change code in the constructor of pedro.mda.schema.Startup so that it instantiates a copy of
the new schema reader instead of MSVSchemaReader
The most important part of this activity is to be able to map model concepts to attributes in
RecordModel, EditFieldModel, ListFieldModel, IDFieldModel, GroupFieldModel and
DataFieldModel classes found in the pedro.mda.model packages.
6.2 Adding an extra data layer
Pedro currently has two data layers inside each native .pdz file:
 a .pdr file which holds the data that appears in the data set
 a .meta file which holds meta data about the file. Most of the data held in this file represent
the ontology terms which were used to mark up record fields.
The information layers are described in more detail in the I/O System described in Section 4.9. In
the future, native files may contain additional layers which describe data quality or aspects of
provenance. The way the .meta layer was developed provides a good example of how a new layer
of information could be supported. For more information on this, please see Section 4.11, which
deals with the design of Pedro’s Meta Data sub-system.
1. Develop data container classes which will hold the information you want to maintain.
2. Create a manager class that manages the data container classes
3. Make the manager accessible to other parts of Pedro by registering it as a new variable in
the pedroFormContext variable. You may have to add this change to classes such as
pedro.desktopDeployment.PedroApplication and others that have a main method.
4. Use the hash key you created in the previous step to access the variable from within
pedro.io.NativeFileReader and pedro.io.NativeFileWriter.
5. Decide on a file extension to use for the layer, eg: “*.provenance”
6. Access the manager object and use it to read and a layer file ending with the extension you
wanted. The new layer should appear in *.pdz files.
If you want this layer to be editable by people, then it is good to look to the design of the Meta Data
Editor for guidance:
1. write an XML schema which describes the information you want to maintain.
2. Develop plugins which will load and save data for only the information layer you want.
You may have to replace the default File menu features “Open”, “Save” and “Exit” because
these are designed to open and save a *.pdz file, not a specific information layer within a
*.pdz file.
6.3 Creating a new field view
TabletPedro provides an example of how a new field view can be developed. The application uses
pedro.tabletDeployment.TextFieldView instead of pedro.desktopDeployment.TextFieldView when
it renders forms.
The steps:
1. Make a class which extends desktopDeployment.EditFieldView and implements
desktopDeployment.CustomisedFieldView. (eg: tabletDeployment.TabletTextFieldView)
2. use the desktopDeployment.RecordViewFactory class to associate some field view type
with the class you’ve written. (see the constructor for tabletPedro.DataEntryPanel)
6.4 Adding Form Properties
The advent of the Pedro Configuration Tool has made it easy to extend the system’s capabilities of
recognising new configuration attributes.
1. Add new fields to the configuration data model (see the
models/pedro_form_configuration/model/pedro_form_configuration.xsd)
2. adjust pedro.mda.config.PedroConfigurationReader so that the parser can detect XML
tags of the new attribute.
3. You may have to add new set/get routines in any number of the configuration data structures
appearing in pedro.mda.config.*.
4. Decide where in the code base you want to access the new configuration attributes. You
will access the PedroConfigurationReader using the code such as:
PedroConfigurationReader configurationReader = (PedroConfigurationReader)
pedroFormContext.getApplicationProperty(PedroApplicationContext.CONFIGURATION
_READER).
EditFieldConfiguration editFieldConfiguration
configurationReader.getConfigurationRecord(recordClassName, fieldName)
??? = editFieldConfiguration.getX()
= (EditFieldConfiguration)
In the code example, the “X” in “getX()” represents the new configuration attribute you want to use.
You are most likely going to use the code snippet in plugins. However, many parts of Pedro have
access to the pedroFormContext variable so you can affect changes to the core code base as well.
6.5 Creating a Web-based Version of Pedro
Pedro should be extensible enough to support new forms of deployment such as a version that
works on the web. The model and view aspects of Pedro’s design were sufficiently well separated
to allow Tablet Pedro to be developed with minimal work.
An example of a web-based application that used Pedro libraries is the web-application generated
by Pierre. Code for Pierre can be downloaded at the same Source Forge site used to host Pedro.
The web application in the download relies on JSP, Java Servlets and the Struts Framework. It
could be used to inform the development of a web-based version of the data entry tool.
6.6 Upgrading to Higher Versions of Java
Since its first release in February 2003, all code for Pedro has been written using JDK1.4.
Developers should have no problem recompiling the code base to support JDK1.5 classes.
7 Future Enhancements
Pedro 2.0 is meant to be one of the last major releases of the tool. A great amount of work has been
done to redesign the software so it can accommodate a greater range of plugins. The most
important future enhancement will be replacing the existing MSV Schema Parser with Castor.
Currently Pedro is able to interpret only a fraction of XML Schema structures. We have found that
the current level of support of schema features is more than adequate for most simple use cases.
However, the benefit of enhancing the schema reader is that it can entertain a wider range of legacy
schemas developed independently of the tool.
Future releases will also include features which have been inspired from Pierre, a project that builds
on Pedro libraries. The features will appear in the Pedro Configuration Tool as plugins, and will be
used to provide data modellers with more support for rapid prototyping activities.
7.1 Replacing the Schema Reader’s MSV Parser with Castor
7.1.1 Description
Pedro is critically dependent on its interactions with SchemaReaderInterface, which is currently
being implemented by the MSVSchemaReader. Former Pedro developer Kai Runte used Sun’s MSV
schema reader to build the class. MSV is a very flexible library of functions which help interpret
various kinds of XML Schemas. Kai’s work resulted in a schema reader that can understand
approximately 10 of XML Schema’s 40 or more concepts. This limited support has allowed Pedro
to be used in a variety of useful settings.
There are a number of reasons to rewrite the schema reader:
 MSVSchemaReader is complicated. Although it uses a Visitor pattern to traverse a syntax
tree, it makes use of a number of embedded classes and HashMap variables that make the
code a bit difficult to understand
 MSV appears to be an aging library maintained by a single developer
 we know the schema reader works but currently there are no developers on the project that
can easily enhance it!
We decided that rather than trying to enhance the MSVSchemaReader, it was better to change
schema reading technologies altogether. Chris Garwood investigated a number of technologies and
found that the Castor Project provided the best alternative. It helps to hide the details of parsing
schemas and it is a project that appears to have a group of people who maintain it.
Castor generates Java class definitions for XML Schemas. We feel that rather than have a schema
reader class parse schema syntax, it can apply Java introspection on the generated classes instead.
We are reasonably confident that we could deduce all of the existing schema properties in addition
to others such as references and inheritance.
7.1.2 Suggested Approach
1. Develop a suite of test cases that can be used to ensure that the new schema reader performs
all the same tasks as the old one. This will probably involve testing the presence or absence
of field properties in the template record definitions generated by the activity.
2. Wrap Castor so that it can be used programmatically to generate Java Class definitions
3. Introspect over the generated classes, and use property values in the same way they are
described in Appendix B.
4. apply the suite of test cases.
5. when all test cases pass, substitute the schema reader. This should only involve changing the
reference to MSVSchemaReader found in pedro.mda.schema.Startup to instead reference
the new SchemaReader.
7.1.3 Scope of Effect
Every tool made in the Pedro project relies on template record definitions produced by the schema
reader. It is critical to thoroughly test properties of the templates produced by the new schema and
compare them to those produced by the MSV Schema Reader. If there are no problems with how
the templates are produced, the rest of the tools that use it should remain unaffected for currently
supported schemas.
7.2 Auto-generate Functional Specifications
7.2.1 Description
Like Pedro, Pierre has a Configuration Tool which is used to rapidly prototype functional
specifications of applications. Both tools use the configuration file made by the configuration tools
to generate end-user applications. However, Pierre also uses the configuration file to auto-generate
functional specifications that are intended to be read by people. Developers, end-users and others
can review the HTML document while they evaluate the application prototypes. To help foster
iterations of development, Pierre allows designers to include end-user comments as part of the
specification. These comments do not effect the behaviour of generated applications. However,
they do appear in the auto-generated HTML document.
A future release of Pedro will include a similar feature for auto-generating functional specifications.
The generation of documentation to suit a current prototype will help in cases where people want to
discuss the tool without requiring a demonstration of the software.
7.2.2 Suggested Approach
1. Add a class called “Comment” to the XML Schema used by the Pedro Configuration Tool.
2. Add list fields containing comments to various classes defined in the schema.
3. Add a new “Comment” data container class to the pedro.mda.config.* package. The
comments would not appear in the generated applications but would remain as notes for the
developer to use in discussions with project leaders and potential end-users.
4. Modify pedro.mda.config.PedroConfigurationReader to handle parsing the comment records
that appear in different parts of the configuration file.
7.2.3 Scope of Effect
This change should only affect the XML Schema for the Pedro ConfigurationTool and the
PedroConfigurationReader. The Comment class won’t be used by anything until a feature is
developed to generate functional specifications.
7.3 Generate “Test” Feature
7.3.1 Description
The test feature will automatically generate prototypes based on the current state of the
configuration file being developed in the Pedro Configuration Tool. In Pedro 2.0, data modellers
usually test their work by saving the current configuration file and invoking Pedro using the
“run_pedro” script included in the release. The test feature would allow them to automatically
generate applications by pressing one button.
7.3.2 Suggested Approach
The task could be supported by a PedroPlugin called TestPedroApplication, which is linked to the
Option Menu via the Pedro Configuration Tool.
The first task of the plugin would be to convert the current configuration model held in memory
into instances of data container classes defined in the pedro.mda.config.* package. For example,
a “record_model” record appearing in the configuration tool would have to be converted into an
instance of pedro.mda.config.RecordConfiguration. During this step, a new instance of
ConfigurationReader is populated with values derived from a tree of record model objects instead of
those parsed from a ConfigurationFile.xml file.
The next step is to launch a version of Pedro that uses the ConfigurationReader instance. The
code for the plugin might look like:
import
import
import
import
pedro.mda.config.*;
pedro.mda.model.*;
pedro.mda.schema.*;
pedro.system.*;
class TestPedroApplicationPlugin implements PedroPlugin {
...
public execute(PedroFormContext pedroFormContext) {
//get the root model of the current configuration file
RecordModel currentRecordModel = (RecordModel)
pedroFormContext.getProperty(PedroFormContext.CURRENT_RECORD_MODEL);
RecordModelUtility recordModelUtility = new RecordModelUtility();
RecordModel configurationFileRootModel
= recordModelUtility.getRootModel(currentRecordModel);
//Use the converter you developed
PedroConfigurationReader configurationReader
= YourConversionClass.createConfigurationReader(configurationFileRootModel);
//set up an execution environment which can be used to launch the test
application
PedroApplicationContext targetSchemaApplicationContext =
(PedroApplicationContext)pedroFormContext.getApplicationProperty(PedroApplicatio
nContext.TARGET_SCHEMA_APPLICATION_CONTEXT);
File targetSchemaModelDirectory = (File)
targetSchemaApplicationContext.getProperty(PedroApplicationContext.MODEL_DIRECTO
RY);
//develop some routine to ensure you get the name of the model folder
String targetSchemaModelFolder
= parseModelFolder(targetSchemaModelDirectory);
//let Pedro’s “Startup”, “WorkspaceFileFinder” and “Workspace” classes produce
//a new instance of PedroFormContext -- this will hold all the environment
//variables needed to launch a test version of the application
WorkspaceFileFinder testWorkSpaceFileFinder
= new WorkspaceFileFinder(“.”, targetSchemaModelFolder, false);
Startup testStartup = new Startup(new PedroApplicationContext() );
testStartup.start(testWorkSpaceFileFinder.getSchema(),
testWorkSpaceFileFinder.getMainConfigurationURL(),
testWorkSpaceFileFinder.getLibraryDirectory(),
testWorkSpaceFileFinder.getDocumentDirectory(),
testWorkSpaceFileFinder.getResourceDirectory(),
testWorkSpaceFileFinder.getFileExtensionsToLaunchURL(),
testWorkSpaceFileFinder.getSessionFile(),
true,
true);
Workspace testWorkSpace
= Workspace.createWorkSpace(testStartup);
testWorkSpace.setWorkSpaceFiles(testWorkSpaceFileFinder);
PedroFormContext testFormContext
= testWorkSpace.getPedroFormContext();
//now set the configuration reader object:
testFormContext.setApplicationProperty(PedroApplicationContext.CONFIGURATION_REA
DER, configurationReader);
7.3.3 Scope of Effect
The scope of development should not effect existing code.
8 Overview of Code Packages
8.1 Package “pedro.configurationTool”
This package describes the code for the Pedro Configuration Tool. The tool is invoked through
PedroConfigurationTool. Class names ending in “Plugin” implement the
pedro.soa.plugins.PedroPlugin interface and are used to provide custom functionality for the
configuration tool. Most of the other classes are used to provide dialogs for the plugins.
8.2 Package “pedro.desktopDeployment”
The pedro.desktopDeployment.* package describes classes that are used to make the desktop
version of the data entry tool. It is quite large but its classes can be organised into just a few
categories. Classes that can be invoked as applications include PedroApplication and
PedroAlerts. PedroService is intended to behave as a component that operates within an
environment provided by a client application. The service form of deployment is how Pedro
operates within other service platforms such as MyGrid.
Classes that describe the behaviour of Pedro’s NavigationTree include:






NavigationTree
NavigationTreePanel
NavigationTreeNode
NavigationView
TreeSelectionEventManager
TreeNodeRenderer
Perhaps the most complicated class in the package is TreeSelectionEventManager, which
changes the main form to display the record which is displayed in the NavigationTree. The
complexity in the code comes from having to validate the current record before jumping to the next.
NavigationView is a relatively new class which was developed to allow Pedro to treat the
NavigationTree in the desk top deployment in a similar way to the
pedro.tabletDeployment.RecordStack navigation widget used in the Tablet deployment.
Classes for rendering the main form include RecordView and all the form field classes, which are
represented by classes ending with “FieldView”. Instances of field views are produced by
RecordViewFactory.
8.3 Package “pedro.io”
This package contains classes that manage most of the features for reading and writing data to file.
The main I/O routines are PedroDataFileWriter and PedroDataFileReader, which manage a
single XML file that conforms to a schema. The XMLSubmissionFileReader and
XMLSubmissionFileWriter classes wrap these classes but provide little functionality of their own.
Although they are used to support the “Import from XML” and “Export to Final Submission
Format” features in Pedro’s file menu, they will eventually be replaced with the classes they wrap.
Other classes which will be phased out are BasicPedroFileReader and BasicPedroFileWriter,
which are artefacts from older releases.
NativeFileFormatReader and NativeFileFormatWriter manage Pedro’s *.pdz files.
classes use the PedroDataFileWriter and PedroDataFileReader classes to write each
These
information layer.
Older versions of Pedro had a package pedro.dataImport.*, whose classes helped import data
from and export data to spreadsheets. In Pedro v2.0, the package was eliminated and the files were
moved into the pedro.io.* package. The following data import classes support the “Import from
Spreadsheet” and “Export to Spreadsheet” features in Pedro’s File menu:











ExportToSpreadSheet
FlatFileReader
HeaderRemovalDialog
ImportDataToFieldDialog
ImportFromSpreadSheet
ImportRecordSelectorDialog
ImportTableHeaderClicker
ImportTableModel
RecordImporter
RecordImportMenuItem
TargetRecordFieldSelectorDialog
These classes used to be defined in a package called pedro.dataImport.* but have since been
moved to the pedro.io.* package.
8.4 Package “pedro.mda.config”
This package contains classes which manage the configuration options for the data entry tool which
are not covered by the XML Schema. The most significant class, PedroConfigurationReader,
reads a ConfigurationFile.xml file produced by the Pedro Configuration Tool, and uses
instances of data container classes to manage the configuration options. The
PedroConfigurationReader is referenced in many parts of the code base and provides a look-up
service to find configuration options associated with schema concepts and other parts of the
application. Most of the other classes in this package either parse parts of a
ConfigurationFile.xml file or hold configuration data.
SessionManager
manages a file called SessionAspects.xml which holds information about the
most recently used files. It also uses an instance of FileLauncher to read the file
./config/FileExtensionsToLaunch.xml. This small file associates file extensions with shell
commands which launch other applications. The mappings are used when end-users press the
“View” button on a URL field view.
8.5 Package “pedro.mda.model”
This package describes Pedro’s native data structures. Section 4.2 provides most of the important
information about this topic. Appendices A and B provide more information about what
configuration properties are used to set attributes in native data structures such as RecordModel,
ListFieldModel, EditFieldModel and AttributeFieldModel.
8.6 Package “pedro.mda.schema”
The package contains classes which are responsible for interpreting the XML schema. Most of this
activity is encapsulated by Startup, which is used in the “main” class of almost every tool that uses
Pedro libraries. Startup uses an implementation of SchemaReaderInterface to parse the schema.
For now, that implementation remains the MsvSchemaReader class. Most of Pedro depends on this
one class for extracting schema information and using it to produce template definitions of native
data structures.
8.7 Package “pedro.metaData”
The package contains classes which manage Pedro’s meta data. The classes can be divided into two
groups:
 data container classes that hold meta data information
 classes that support the Pedro Meta Data Editor
DocumentMetaData
and RecordMetaData hold meta data information about documents and records
respectively. Their attributes are described by the meta data schema described in Appendix C. Most
of the other classes service the Pedro Meta Data Editor. Most of Pedro’s normal File menu features
don’t work in the editor because they manage information in the data layer of the *.pdz file.
SaveMetaDataFile, OpenMetaDataFile, ExitMetaDataEditor and CloseMetaDataFile provide
file menu features that only affect the meta data layer of a *.pdz file.
8.8 Package “pedro.soa.alerts”
The Pedro Alerts system is based on the concept of an Alert, which is a collection of matching
criteria associated with an intent such as an error, a warning, a request for communication or a
desire to post a bulletin. Data for criteria are managed by the following classes:




EditFieldCriterionModel
ListFieldCriterionModel
MatchingCriterion
EditFieldComparator
Criteria are visualised with the help of these classes:




EditFieldCriterionView
ListFieldCriterionView
CriterionView
MatchingCriteriaView
The intent of an alert is described by states defined in the AlertActionType class.
Many of the classes such as PedroAlertsEditor, ValidationTreePanel, and AlertNode support
the UI for the Pedro Alerts Editor. Some classes such as AlertsBundle, AlertsBundleReader and
AlertWriter help read and write alerts to alert bundles, which are *.zip files that contain alerts
expressed as *.xml files.
8.9 Package “pedro.soa.id”
This small package describes services which generate identifier values for text fields. The service is
a Java class that implements the IDGeneratorService. Pedro comes with its own default service
DefaultIDGeneratorService. The creation of the services is managed by the
IDGeneratorServiceFactory.
8.10 Package “pedro.soa.ontology.provenance”
This package manages the meta data that Pedro gathers about the ontology terms used to mark-up
form fields. OntologyTermProvenance holds meta data about an ontology term. The class
manages an instance of OntologyMetaData, which holds general information about the ontology
such its name and version. An OntologyTermProvenance object also has an
OntologyTermMetaData object that holds meta data that specifically relates to the term. The
attributes of these classes correspond to parts of the meta data schema which drives the Pedro Meta
Data Editor (see Appendix C).
OntologyTermProvenanceManager
keeps track of all the terms that are used in a session. When a
user opens a file, Pedro reads the meta data file and populates the manager with terms that have
already been used to tag the document.
8.11 Package “pedro.soa.ontology.sources”
This package contains all the classes which support the ontology source class that is part of the
Pedro Ontology Service Framework. OntologySource is the main interface developers must
implement to make their own providers of ontology terms. TreeOntologySource extends the
OntologySource by supporting the notion of a tree of terms.
Most of the classes in this package support default implementations of TreeOntologySource.
AbstractTreeOntologySource contains most of the code for searching through a tree. It also
makes use of OntologyTreeCloner to return sub-ontologies that are copies of the tree that are
rooted by certain terms. TabIndentedTextSource extends AbstractTreeOntologySource and
reads terms from a tab-indented text file. XMLOntologySource extends the same class but uses its
own particular XML-based format for expressing ontology terms.
SingleColumnTextSource is an example of an ontology source that implements OntologySource
but not TreeOntologySource. It reads its terms from a text file containing a single column of
terms.
There are a number of classes related to the representation of an ontology term. OntologyTerm is a
data container class with properties that include an identifier, a label, and a collection of related
terms. The class does not describe how terms are related. However, the relations are inferred by
the OntologyRelationshipType parameter that is used when an OntologySource returns a
collection of terms related to a given term. TreeOntologyTerm extends OntologyTerm to include a
notion of a parent term. Most of the default ontology source implementations use
TermIdentifierUtility to create default identifiers for terms.
The package also includes a number of marker interfaces that provide rendering hints for Pedro’s
pedro.soa.ontology.views.DefaultOntologyViewer class. ImageDescriptionSupport
implies that most terms will have an associated image. PictureOntologySource, a sub-class of
XMLOntologySource, uses this interface. The viewer uses the support of this interface to make it
show terms as a collection of thumbnail images. DictionaryDescriptionSupport implies that
most terms are associated with a definition. The viewer uses this information to support a tabular
view of terms with columns for term and definition. URLDescriptionSupport implies that a
source will have a help web page for most terms. Sources which implement this interface cause the
viewer to include an HTML panel to present the web page for the currently selected ontology term.
8.12 Package “pedro.soa.ontology.views”
This package contains all the classes which support the ontology viewer that is part of the Pedro
Ontology Service Framework. The OntologyServiceManager is responsible for listening to right
click actions end-users make when their mouse cursor hovers over the label of a form field which
supports ontology services. When this happens, the manager class produces a popup menu with
links to OntologyServices that have been associated with the field. If an ontology contains at
most 40 terms, Pedro uses instances of OntologyTermMenuItem to render ontology terms as menu
items. If the service has more than 40 terms, it delegates to an OntologyViewer.
If the OntologyService has no OntologyViewer, Pedro uses its own default viewer
DefaultOntologyViewer. This class interrogates OntologySource objects to determine what
other interfaces they implement. It uses the presence of other interface implementations to
determine which of the following default views it can use to render an ontology:
 ListView - which renders all pedro.soa.ontology.sources.OntologySource objects
 TreeView - renders all pedro.soa.ontology.sources.TreeOntologySource objects. Note that
the items shown in the TreeView’s tree display are instances of OntologyTermNode. This
class extends javax.swing.tree.DefaultMutableTreeNode.
 DictionaryView - renders sources which also implement
pedro.soa.ontology.sources.DictionaryDescriptionSupport. Note that
DictionaryView is a JTable which commits its data to an instance of the
DictionaryTableModel.
 PictureView - renders sources which also implement
pedro.soa.ontology.sources.ImageDescriptionSupport.
The default viewer is able to interact with the views in a consistent manner because they all
implement the OntologyView interface. This is an interface that was specifically developed to
allow the default viewer to support multiple ways of presenting ontologies to end-users. The views
register the default viewer as an OntologyViewListener. This interface is used to alert
DefaultOntologyViewer when terms have been selected in one of the views.
When an users have chosen their terms in some kind of OntologyViewer, the viewer notifies an
OntologyTermSelectionListener. This class is responsible for recording the terms that have
been used and inserting the term labels into the appropriate form field. Pedro has its own
DefaultOntologyTermSelectionListener class which supports this interface.
8.13 Package “pedro.soa.plugins”
This package contains all the classes that are needed for developers to extend the system with their
own plugins. PedroPlugin is the main interface that developers must use if they are making their
own plugins. The class implementing PedroPlugin can also implement a number of marker
interfaces such as DataExportPlugin, DataImportPlugin, ValidationPlugin and
AnalysisPlugin. When Pedro registers plugins, for the current form, it shows a count of import,
export and analysis plugins on the status bar.
At startup, Pedro uses PluginFileFilter to identify JAR files that end in the *.plugins extension.
The PluginLoader examines each Java class in these jar files to determine whether they implement
the PedroPlugin interface and other marker interfaces. It also determines whether a plugin can be
applied to the current record type being displayed.
If the currently displayed record has plugins associated with it, Pedro renders a button in the top
right corner of the form. If a field is associated with plugins, the tool renders the same kind of
button at the end of the form field. When end-users press the button, an instance of
PluginSelectionDialog appears. The dialog presents the end-users with a list of the available
plugins.
8.14 Package “pedro.soa.security”
contains basic classes for supporting a security service. The service is used to determine whether a
given User can access application features such as ontology services, form buttons, and menu
items. For now, Pedro relies on a DummySecurityService which allows full access to any feature.
In future, other implementations of the SecurityService will be used to mask document data,
restrict some features and provide others as part of a system of user preferences.
8.15 Package “pedro.soa.validation”
The classes in this package support Pedro’s facilities for validating the data set. There are three
main interfaces that developers can implement to create their own validation services.
DocumentValidationService is for services which validate the contents of an entire data set.
They are triggered whenever an end-user attempts to use the “Export to Final Submission Format”
button in the File Menu or the “Show Errors” button in the View Menu.
RecordModelValidationService is for services which are meant to identify illegal combinations
of form field values. This service is triggered whenever the end-users press “Keep” or “Done”
buttons on the main record form. FieldValidationService is for services which validate the
contents of a particular field. EditFieldValidationServices are used to validate a single field
value, whereas a ListFieldValidationService will be used to identify errors in the number and
type of child records a list field contains.
Most of the classes in the package focus on default field validation services which perform type
checking on field values. All of them extend AbstractEditFieldValidationService which contains
code for managing the field name and for determining whether a field value is empty. The rest of
the classes apply type checking to field values which are non-empty. Pedro has a separate
validation service to scan for required fields which are left empty.
Pedro initiates a validation activity via the ValidationFacility class, which in turn applies
appropriate field, record and document level validation services.
8.16 Package “pedro.tabletDeployment”
The package describes classes that are used to create the TabletPC deployment of the data entry
tool. Most of the classes have the same names as others which are used in the desktop deployment.
8.17 Package “pedro.util”
The following classes relate to Pedro’s context sensitive help system:





ContextHelpItem
ContextHelpService
HelpEnabledButton
HelpEnabledCheckBox
HelpEnabledCheckBoxMenuItem




HelpEnabledLabel
HelpEnabledMenuItem
HelpLinkCreator
HelpLinkListener
Other classes relate to file filters that are used to limit file searches to include certain types of files:






XMLFileFilter
XSDFileFilter
ZIPFileFilter
HTMLFileFilter
PedroFileFilter
PedroBackupFileFilter
Most of the remaining classes order or display items in a list.
8.18 Package “pedro.system”
The most important classes in this package are Context classes that hold a collection of
environment variables. These variables refer to different parts of the Pedro Application. The
context classes include:




Context
PedroFormContext
PedroDocumentContext
PedroApplicationContext
PedroUIFactory is used prolifically throughout the application. It centralises the creation of all
components. PedroResources is also used in many classes to provide the String values that
appear in UI components.
ModelSelectorDialog
is the dialog that allows end-users to run Pedro with a model.
8.19 Package “pedro.soa”
Contains ServiceClass,the basic service class, and GeneralServiceFactory, the basic factory
class for creating services. The package also contains interfaces for making components that can
edit fields.
8.20 Package “pedro.workBench”
This is a simple package meant to provide a display for activating all the other Pedro tools. The
main class is PedroWorkBench.
UI
9 Index
Appendix A: Schema for the Pedro Configuration Tool
This appendix describes the XML Schema that drives the Pedro Configuration Tool. The schema
defines the record structures that hold configuration data when the tool is running. Many of the
schema classes correspond to classes that appear in the code base. For example, the
“ontology_service” definition corresponds to pedro.soa.ontology.views.OntologyService and the
“record_model” definition corresponds to pedro.mda.model.RecordModel.
The file ConfigurationFile.xml which appears in the ./config directory of each model folder should
validate against this schema.
A.1 Configuration Options for Menu Features
Classes which represent properties of Pedro menus are illustrated in Figure A-1. The following
subsections describe properties of menus that appear in the menu bar of a Pedro dialog.
Figure A-1: part of the Pedro configuration schema that relates to properties of menu features.
A.1.1 Class: “menu_features”
Property
existing_menus
custom_menu
Description
collection of menu configuration records that describe the
functionality of the standard File, Edit, Option, View and Help
application menus.
a collection of custom_menu objects that describe the features of
custom menus.
A.1.2 Class: “existing_menus”
Property
file_menu
edit_menu
view_menu
options_menu
help_menu
include_window_menu
Description
configuration record for the File menu; if it is absent the menu is not
included in the menu bar of the generated application.
configuration record for the Edit menu; if it is absent the menu is not
included in the menu bar of the generated application.
configuration record for the View menu; if it is absent the menu is not
included in the menu bar of the generated application.
configuration record for the Options menu; if it is absent the menu is
not included in the menu bar of the generated application.
configuration record for the Help menu; if it is absent the menu is not
included in the menu bar of the generated application.
Pedro has a “Windows” menu that shows a list of files that are
currently open. If this configuration value is true, the menu will
appear. Otherwise, the Windows menu will not appear in the
generated application.
A.1.3 Class: “file_menu”
Property
Description
include the “New...” menu item
include the “” menu item
include the “Import from Spreadsheet” and “Export to
Spreadsheet” menu items
show_open_file
include the “Open” menu item
show_save_file
include the “Save” menu item
show_saveAs_file
include the “Save As...” menu item
show_close
include the “Close” menu item
show_import_records
deprecated. Import records used to be a menu where
I/O plugins appeared. This is now not necessary
because of the way the new plugins system works.
show_import_from_xml
include the “Import from XML...” menu item
show_export_final_submission_format include the “Export to Final Submission Format”
menu item
show_templates
include the menu items “Load Template” and “Save
Template”
show_load_template
include the “Load Template” menu item
show_save_template
include the “Save Template...” menu item
show_exit
include the “Exit” menu item
plugin
collection of plugin objects that describe customised
application features.
show_new_file
show_favourites
show_spreadsheet_options
A.1.4 Class: “edit_menu”
Property
show_copy
show_paste
Description
include the “Copy” menu item. This feature allows end-users to copy
text from a form field or copy a sub-tree of records.
include the “Paste” menu item. This feature allows end-users to paste
plugin
text into the current field or a sub-tree of records into the current
record.
collection of plugin objects that describe customised application
features.
A.1.5 Class: “options_menu”
Property
show_describe_document
show_alerts
plugin
Description
include the “” menu item
include the “” menu item
collection of plugin objects that describe customised application
features.
A.1.6 Class: “view_menu”
Property
show_errors
show_dependencies
show_changes
show_search
show_clear
plugin
Description
include the “Show Errors” menu item
include the “Show Dependencies” menu item
include the “Show Changes” menu item
include the “Show Search” menu item
include the “Show Clear” menu item
collection of plugin objects that describe customised application
features.
A.1.7 Class: “help_menu”
Property
show_about
show_schema_information
show_context_help
help_document
plugin
Description
include the “Show About” menu item
include the “Show Schema Information” menu item
include the “Enable Context Help” menu item
a collection of help_document objects. A help_document is used to
render a help menu button which is associated with a pop-up web
page.
collection of plugin objects that describe customised application
features.
A.1.8 Class: “help_document”
Property
label
link
Description
The name of the menu item representing the link to a help web page
a URL or a local file path for a web page.
A.1.9 Class: “plugin”
Property
name
feature_code
Description
name of the plugin; this is used to represent the plugin in lists and
menus.
a unique identifier for the feature. In future, this will be used along
class_name
list_order
tool_tip
description
is_persistent
with a User object to have a security service determine whether a
plugin should appear in the application.
a Java class that implements the interface pedro.plugins.PedroPlugin
not yet implemented; the list order helps define the order in which a
group of plugins is displayed.
text that hovers over a user interface object which represents the
plugin (eg: a menu item)
a description of what the plugin does
determines whether pedro.soa.plugins.PluginFactory creates multiple
instances of a plugin or a single instance of a plugin through the
lifetime of an application session.
A.1.10 Class: “custom_menu”
Property
name
feature_code
position
tool_tip
help_link
plugin
Description
name of the menu
code used to uniquely identify an application feature. In future
releases, this will be used by a security service to determine whether a
given user can access it or not.
the relative position a menu has with respect to other custom menus
that appear in the menu bar.
help text which appears when an end-user’ mouse cursor hovers over
the menu item
a web page associated with the menu. This will be shown if contextsensitive help is activated.
collection of plugin objects that describe customised application
features.
A.2 Configuration Options for Record Structures
Classes that describe configuration properties for Pedro’s native data structures are shown in Figure
A-2:
Figure A-2: part of the Pedro configuration schema that describes data structures
The following tables describe the properties of the major classes in the class diagram.
A.2.1 Class: “schema_concept_field”
Note that this class doesn’t actually appear in the Pedro configuration schema. It is included to
make diagramming easier. The following properties represent attributes in the native data structure
pedro.mda.model.DataFieldModel.
Property
name
ontology_identifier
tool_tip
help_link
form_comments
plugin
Description
name of the schema concept
an identifier which represents an XML Schema form concept as an
ontology identifier. Pedro’s Ontology Service Framework allows
ontology services to ask Pedro questions about what concepts appear
in the current form. This information can include what field called the
service, what other fields exist, the kind of record currently being
displayed and other ontology terms that have been used to mark up
other form fields. The identifier makes XML schema concepts more
compatible with formalisms used by some kinds of ontology
technologies.
help text which appears over the User Interface object that represents a
schema concept. This can be the title of the record form or the labels
of form fields.
a URL for a web page that describes the schema concept for the endusers.
comments associated with a schema concept which appear on the
form. For a record, the form comments will appear immediately under
the record title on the main form. Comments for fields will appear
immediately above them.
a collection of plugins associated with the concept. If plugins are
present for a record, Pedro will render a “Plugins...” button in the top
right corner of the main form. Field-level plugins are represented by
the same button, which appears at the end of a form field.
A.2.1 Class “record”
The “record” class in the schema describes configuration properties that are used to set attributes of
pedro.mda.model.RecordModel.
Property
Description
attribute_field
a collection of fields which have been identified as identifier
fields in the XML Schema. Such fields will make use of “ID” and
“IDREF” properties.
edit_field
a collection of edit fields that each support a single scalar value.
list_field
a collection of list fields, each of which may hold one or more
records of one or more types.
record_validation_service a validation service designed to validate combination of form field
values.
A.2.2 Class “list_field”
The “list_field” class describes configuration properties that are used to set attributes of
pedro.mda.model.ListFieldModel
Property
Description
list_field_editing_service a collection of list_field_editing_service objects. Each object
describes an editing component that is invoked whenever a user
presses the “New” or “Edit” buttons to make a new record of a
particular child record type.
A.2.3 Class “edit_field”
The “edit_field” class describes configuration properties that are used to set attributes of
pedro.mda.model.EditFieldModel.
Property
Description
default_value
the value that will be used to populate a new record when it is first
created and displayed in the main form.
units
units associated with the field value (eg: cm, hrs, km/h...)
allow_free_text
determines whether a text field accepts text or not. This is used
with ontology services to make form fields that accept free text,
only ontology terms selected from a service, or a combination of
the two.
is_scrolling_text_field
determines whether a text field should be rendered with a single
text line or in a text area enclosed within a scroll pane.
is_display_name_component whether an edit field value is used as part of the display name that
advertises the parent record model in lists.
field_validation_service
a collection of field_validation_service objects, each of which
describes a service which performs an error check on the field
value.
ontology_service
a collection of ontology_service objects, each of which describes
an ontology service that marks-up form fields with ontology
terms.
A.2.4 Class “attribute_field”
The “edit_field” class describes configuration properties that are used to set attributes of
pedro.mda.model.EditFieldModel.
Property
id_generator_service
Description
an id_generator_service object that describes the service used to
provide identifier values for an attribute field.
A.3 Configuration Options for Service Classes
Figure A.3-1: the part of the Pedro configuration schema that describes services
Figure A.3-1 describes the classes defined in the XML Schema which hold configuration options
about Pedro’s native data structures. Note that service_class doesn’t actually exist in the
configuration schema but is included here for ease of diagramming the classes. The following
tables describe the schema properties in detail. Most of the classes inherit all their attributes from
service_class and do not have their own tables of attribute descriptions.
A.3.1 Class “service_class”
Properties from this synthetic schema class correspond to properties found in
pedro.soa.ServiceClass.
Property
Description
class_name
the fully qualified path of a Java class.
parameter
a collection of parameter objects, which are name-value pairs.
Parameters are used to initialise the service.
is_persistent
if the value is false then service factories will produce a new
instance of the service. If the value is true the factories will
return a single managed instance of a service.
A.3.2 Class “ontology_service”
This schema class holds configuration options that are used to set
pedro.soa.ontology.views.OntologyService.
Property
Description
name
name of the ontology service. This name will be displayed in
lists of services for the end-user.
feature_code
a unique identifier for the ontology service. This will be used
later in projects that are trying to use the same services to
perform semantic searches in data repositories.
description
description of the ontology the service provides.
ontology_source
the component that provides ontology terms; this is optional but
one of source or viewer must be present in an ontology service
ontology_viewer
the component that views ontology terms; this is optional but one
of source or viewer must be present in an ontology service
A.3.3 Class “list_field_editing_service”
Pedro supports the use of components for editing edit and list fields. When these are specified, the
components are invoked for editing rather than another pedro form. An edit field can have exactly
one kind of editing service. However, in a multiple-type list, there could be a list field editing
service for each supported child record type. This 1:N relationship is why
list_field_editing_service has its own class but edit_field_editing_service does not.
Property
Description
name of the ontology service. This name will be displayed in
lists of services for the end-user.
editing_component_class_name fully qualified path of the Java class that implements the
editing service
record_class_name
A.3.4 Interfaces Implemented by Service Classes and Plugins
A number of schema classes used to desribe services have a field that that holds a class name which
implements some kind of interface. The following table indicates what interface should be
implemented by a Java class associated with services:
Service Type described in the
XML Schema
ontology_source
ontology_viewer
field_validation_service
record_validation_service
document_validation_service
Interface expected to be implemented by service class
pedro.soa.ontology.sources.OntologySource
or
pedro.soa.ontology.sources.TreeOntologySource
pedro.soa.ontology.views.OntologyViewer
pedro.soa.validation.FieldValidationService
or
pedro.soa.validation.EditFieldValidationService
or
pedro.soa.validation.ListFieldValidationService
pedro.soa.validation.RecordModelValidationService
pedro.soa.validation.DocumentValidationService
plugin
id_generator_service
list_field_editing_service
editing_component_class_name
pedro.soa.plugins.PedroPlugin
pedro.soa.id.IDGeneratorService
pedro.soa.ListFieldEditingComponent
pedro.soa.EditFieldEditingComponent
Appendix B: Mapping XML Schema Attributes to Application
Properties
The following table shows how code fragments of an XML Schema influence the creation of
template records and the generation of forms.
XML Schema Construct
Effect on Form Generation
<xs:element name=”Organism”>
this structure represents a record form called
<xs:complexType>
“Organism”. Information about the form is used
<xs:sequence>
to create a template of a RecordModel object.
...
“Organism” would be used to set the
(field definitions)
...
record_class_name attribute.
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="species_name" .../>
<xs:element ref="Organism"/>
or
<xs:group ref="ProcessingStep" .../>
...
<xs:group name="ProcessingStep">
<xs:choice>
<xs:element ref="FreezeSample"
minOccurs="0"/>
</xs:choice>
</xs:group>
...
<... minOccurs="0"/>
or
<... minOccurs="1"/>
<... maxOccurs=”1”/>
or
<... maxOccurs=”unbounded”/>
represents an edit form field that holds one
value. Information about the field is used to
create a template of an EditFieldModel.
“species_name” would be used to set the name
attribute.
represents a list field. “ref” indicates the field
should be rendered as a list.
The <xs:element..> example indicates the list
will support only records of type “Organism”.
The <xs:group...> example indicates the list
will support multiple types of records defined in
an <xs:group...> declaration. In the example,
one of the record types supported in the list
would be “FreezeSample”. In the top example,
“Organism” would be used to set the
childTypes attribute of ListFieldModel. In
the bottom example, an array of Strings
including “FreezeSample” would be used to set
the same childTypes attribute in
ListFieldModel.
The value for childTypes is used to determine
the rendering hint expressed in the
fieldViewType of DataFieldModel. The hint
describes whether a list should be rendered to
support one or multiple types of children.
determines whether a form field is optional or
required. A minOccurs value of “0” means the
field is optional. A value of “1” indicates the
field is required. All other values are ignored by
the schema reader. The minOccurs value is
used to set the “isRequired” field of
DataFieldModel.
determines whether a list field holds one item or
multiple items. The maxOccurs value is used to
<...type=”xs:string”.../>
or
<...type=”xs:integer”.../>
or
<...type=”xs:float”.../>
or
<...type=”xs:decimal”.../>
or
<...type=”xs:double”.../>
or
<...type=”xs:positiveInteger”.../>
or
<...type=”xs:date”.../>
<...type=”xs:boolean”.../>
<...type=”xs:anyURI”.../>
<xs:element name="organism_type">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration
<xs:enumeration
<xs:enumeration
<xs:enumeration
<xs:enumeration
<xs:enumeration
organism"/>
value="mammal"/>
value="bird"/>
value="amphibian"/>
value="reptile"/>
value="fish"/>
value="micro-
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name="sample_code">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[A-Z]*"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
set the fieldViewType of DataFieldModel
with a rendering hint for multiple-item or singleitem lists.
indicates which type-based field validation
service will be associated with an edit field.
causes Pedro to render the field as a boolean
form field with two radio buttons for “true” and
“false”. This declaration is used to set “true”
and “false” values for the choices attribute of
GroupFieldModel.
indicates that an edit field that can take either a
URL or a file path. The field is rendered as a
text field that is accompanied by a “Browse”
button which allows end-users to search for a
file.
an <xs:restriction...> tag with
enumerations is rendered as a combination field.
If there are three or less enumerations, the field
is rendered as a collection of radio fields. If
there are more than three enumerations, the field
is rendered as a drop-down list of choices.
The enumerations are used to set the choices
attribute of GroupFieldModel.
an <xs:restriction...> tag that contain a
<xs:pattern../> tag will cause Pedro to
render a text field which is associated with a
validation service that checks whether the value
complies with a regular expression.
Appendix C: Schema for the Pedro Meta Data Editor
The Pedro Meta Data Editor is driven off the schema described in Figure C-1:
Figure C-1: the schema describing meta data managed by Pedro
The pedro_meta_data class holds the summary data about a document. It will have instances of
record_meta_data for each record type that appears in the data layer. A record_meta_data
object will hold the number of times that a record type appears in the document, and it will have a
collection of field_meta_data objects. The field_meta_data objects hold a collection of
ontology terms that are used to mark-up form fields. The ontology_term class holds all the
provenance data about a term. Most of the fields are borrowed from the SKOS standard that
describes ontology meta data. Fields prefixed by “ontology_service” describe the software
service, not the terms themselves. This might be important for software agents which need to know
about the service providing the terms. Items such as super_class and characteristic are string
identifiers, but they have been expressed in their own separate classes. This is because Pedro
doesn’t have a mechanism for displaying arrays of simple types. To support lists, these fields have
to be expressed as separate classes. Other fields that appear at most once are also expressed in their
own classes, but at some point they may be recast as a single value edit field within the ontology
term class. The following tables describe the class properties in detail.
C.1 Class: “pedro_meta_data”
This class represents the top level form that appears in the Pedro Meta Data Editor. Most of its
fields describe general information about the document.
Property
title
author
institution
document_description
record_meta_data
Description
title of the document; provided by the end-user through the
“Describe this document” feature in the Options menu of the
Pedro editor.
author that produced the document; provided by the end-user
through the “Describe this document” feature in the Options
menu of the Pedro editor.
institution that produced the document; provided by the end-user
through the “Describe this document” feature in the Options
menu of the Pedro editor.
a summary that describes the nature of the document; provided
by the end-user through the “Describe this document” feature in
the Options menu of the Pedro editor.
collection of record_meta_data objects that retain information
about record types which appear in the document
C.2 Class: “record_meta_data”
This class describes meta data for a record type defined in the target schema.
Property
Description
name
a record type defined in the target schema
frequency
the number of times the record type appears in the data layer of
the document
field_meta_data
collection of field_meta_data objects that hold meta data
about the edit fields which are defined in the target schema for
the given record_type
C.3 Class: “field_meta_data”
This class describes meta data for a field defined in the target schema
Property
Description
name
the name of an edit field defined in the target schema. The edit
field belongs to the record type described in the “name” field of
the record_meta_data class.
ontology_term
a collection of ontology term instances that record meta data
about terms used to mark-up form fields.
C.4 Class: “ontology_term”
The ontology_term class holds provenance data about ontology terms used to mark-up form fields.
Most of the fields are described in the SKOS standard which describes aspects of ontology meta
data. Most of the field values will be provided automatically by ontology services. However, they
will remain editable in the Pedro Meta Data Editor to allow data curators to update information
about terms.
Property
ontology_service_code
ontology_service_name
ontology_service_description
ontology_service_version
ontology_service_formalism
ontology_service_email
type
identifier
label
definition
comment
example
status
version_information
image
issued
modified
deprecated
super_class
super_property
domain
range
characteristic
inverse_of
replaces
replaced_by
Description
a code that uniquely identifies the ontology service which was
used to mark up the form field.
name of the ontology service
description of what the ontology service does.
version of the software used to make the ontology service
different ways of expressing an ontology. Examples include
DagEdit and OWL.
the e-mail of a contact person associated with the development
of the ontology service; for example the programmer who
maintains the code for the service
the name of an image file that represents the term, eg: an
anatomy diagram or a diagram describing the structure of a
chemical compound.
Appendix D: Summary of Design Decisions and Historical
Influences for the Pedro Project
Historical Influence 1: The community of potential end-users wanted software that could
produce data sets which complied with a formally defined domain model.
Historical Influence 2: the software project was partly funded by the ESNW, an
organisation whose remit was service provision, not research.
Historical Influence 3: a year of requirements gathering had been done prior to the initial
development of the software tool.
Historical Influence 4: Pedro would be a tool that would be maintained by domain
scientists who were not trained software engineers. The tool would have to accommodate
frequent changes made to the underlying data model.
Historical Influence 5: To make the tool easy to maintain, it was designed using a modeldriven approach.
Historical Influence 6: The model-independent nature of the tool encouraged other
domains to use it. Their feedback helped identify bugs, and led to new features which
helped to service the user community the tool was initially commissioned to support.
Historical Influence 7: Pedro’s ability to support other models was greatly improved by the
work of another developer who was not funded by Manchester proteomics group. The
collaboration made the software code base more appealing for open-source project work.
Historical Influence 8: the remit of the body funding the software development was broad
enough to allow the tool to be applied and modified to suit multiple domains.
Historical Influence 9: the proteomics standards took so long to develop that the software
team began to focus on testing the tool on domains which had simpler or more mature data
models.
Historical Influence 10: another software engineer was brought in to make a testing plan,
rewrite training materials and interact with end-users. His detachment from the code base
gave him objectivity in evaluating how well the tool worked for users. It helped eliminate
biases main programmers would exhibit in justifying their work to end-users.
Historical Influence 11: The Pierre Project was built using the Pedro code base. This
helped improve the robustness and extensibility of core Pedro libraries.
Historical Influence 12: A lab scientist guided the development of Tablet Pedro, which
could be deployed on a Tablet PC. The development has shown that Pedro can be used in a
laboratory, and it promises to attract the interest of other domain groups who gather data in
remote areas. It also shows the program can be adapted to generate forms for alternate
forms of display.
Design Assumption 1: the underlying data model will change and all model concepts are
equally likely to change.
Design Assumption 2: the application would continue to be serviced by scarce developer
resources. These people would likely be skilled domain experts but not trained software
engineers.
Design Decision 1: Pedro will be used developed using a model-driven
approach.
Task Constraint 1: Pedro will be designed to support data capture tasks. Although it could
have plugins that support other activities, its core architecture will not be designed to suit
other tasks. Other activities such as data dissemination, analysis and the provision of
security services will be dealt with in separate projects.
Front End Assumption: people using data capture tools will value usability more than
accessibility.
Web Technology Assessment 1: Web applications developed to promote widespread access
to data should not rely on special technologies for rendering forms. They should use plain
HTML forms that can be rendered by all browser client programs.
Web Technology Assessment 2: The Jakarta Struts project was the best web
technology evaluated to render Pedro as a web application.
Front End Decision 1: Pedro will be developed as a standalone GUI application rather
than as a web application.
Back End Decision: Pedro will store data sets as XML-based documents. Through plugins,
it can support committing data in other ways but the tool will not require the presence of a
data repository.
Language Decision 1: the MDA Design Tool and applications generated from it will be
written using Java.
Generation Decision 1: Application models will be used to generate forms at run-time
rather than rely on code-generation facilities.
Generation Decision 2: the MDA Design tool is a data entry application that uses a model
describing configuration properties. The tool will be generated in the same manner as the
other applications it helps create.
POSF Decision 1: The data entry schema and the mark-up services will evolve
autonomously at different rates. Therefore, decouple these things and support them through
separate mechanisms.
POSF Decision 2: The framework should be able to associate multiple mark-up services
with the same form field.
POSF Decision 3: The framework should support simple stub ontologies that can be used
during rapid prototyping activities.
POSF Decision 4: Base ontology services on ontology identifiers, not word phrases. Each
ontology identifier will be associated with a word phrase, and optionally a definition, a URL
that may describe a help web page, or an image.
POSF Decision 5: Support multiple formalisms. Do not limit support either for very simple
or very sophisticated ontologies.
POSF Decision 6: Let each ontology service comprise one or both an OntologySource and
an OntologyViewer. Each of these objects is described by an interface. An
OntologySource provides terms and is designed on behalf of those who maintain ontologies.
An OntologyViewer renders terms provided by the OntologySource, and is designed on
behalf of those who use ontologies. The ontology service may be configured to mix and
match an OntologySource with an OntologyViewer.
POSF Decision 7: make the design of an OntologySource consider whether terms are
maintained locally or remotely.
POSF Decision 8: the framework should provide some way of determining whether an
OntologySource needs to be updated. End-users should be able to decide whether the
ontology service updates itself.
POSF Decision 9: require ontology services to provide meta data information about the
ontologies. This information should include the name, author, version, description and kind
of formalism supported by an ontology.
Download