Appendix B: User Introduction - Department of Computing Science

advertisement
Aquatic Bacteria Diagnosis
Frank O’Hanlon
September 2011
Dissertation submitted in partial fulfilment for the degree of
Master of Science in Information Technology
Department of Computing Science and Mathematics
School of Natural Sciences
University of Stirling
-i-
Abstract
The Institute of Aquaculture seeks IT expertise and insight to oversee the automation of its
bacteriological diagnosis technique using information collated by the Institute from samples
supplied by clients of the Institute’s diagnostic consultancy service. This will be packaged in
a manner readily usable by the academics of the Institute.
The intention is to construct a product which will capture this diagnostic technique and
allow academics of the Institute opportunity to swiftly conduct their diagnosis via a simple
GUI. The package will be local to the final system, relying on no aspects beyond the user’s
system.
Diagnosis has the form of successively narrowing possibilities within a knowledge base
to determine which bacteriological culture most closely resembles an input. Facility to gain
insight from the data, such as determination of the minimum set of tests to fully best discriminate between remaining cultures, is desired. Embedding costs particular to each test and
factoring these into further recommendations required to pinpoint/move-towards-a-singular
diagnosis is sought.
Automation of the known techniques has been achieved, with significant progress made
on determining, by brute force, the set of tests which are required or recommended to best
distinguish or match the considered culture to one (or some) in the knowledge base. An effective, if unadorned, user interface has been implemented with avenues opened both in code
and in discussion as to how functionality may be improved and extended. Finally, suggestions are made as to other vectors which may be considered in implementation.
- ii -
Attestation
I comprehend the nature of plagiarism and the ramifications breach of University policy may
have.
I affirm that the project herein is my own work, informed only through frank and
open discussion, by the research and reading conducted in the course of preparation,
implementation and review of this project. Sources and references are identified with further
inspiration and influence being drawn only with the fullest integrity.
Signature
Date
- iii -
Acknowledgements
It is my foremost pleasure to humbly thank Dr Andrea Bracciali for his insight, input and
patience in supervision of this project. I would additionally thank the alumnus Ms Christine
Gannon for her immense hospitality during the course of the project and additional thanks to
those whose financial support ensured this project was possible; alumnus Mr Joseph
O'Hanlon, Ms Betty-Helen McMeikan and the Students Awards Agency for Scotland. Further
thanks goes to Dr Mags Crumlish for proposing the task and providing invaluable insight
throughout.
- iv -
Table of Contents
Abstract ................................................................................................................................ ii
Attestation............................................................................................................................ iii
Acknowledgements ............................................................................................................. iv
Table of Contents .................................................................................................................. v
List of Figures.................................................................................................................... viii
1 Introduction ..................................................................................................................... 1
1.1 Scope and Objectives............................................................................................... 1
1.2 Background to Diagnosis......................................................................................... 4
1.3 Technical Needs ....................................................................................................... 7
1.4 Technologies and Relevance.................................................................................... 7
1.5 Overview of Tools & Technology............................................................................ 8
1.6 Executive Summary ................................................................................................. 8
2 Background & Foundations........................................................................................... 10
2.1 Existing Products and Contemporary Work .......................................................... 10
2.2 Data Structures & Non-trivial Optimisation .......................................................... 12
2.3 Proposed Implementation ...................................................................................... 15
2.3.1 A0: Activate Diagnostor ................................................................................... 15
2.3.2 A1: Enter Test Input ......................................................................................... 16
2.3.3 A2: Move Back................................................................................................. 16
2.3.4 A3: Reset .......................................................................................................... 16
2.3.5 A4: Conduct Diagnosis..................................................................................... 16
2.3.6 A5: Select a Cost Preference (unimplemented) ................................................ 16
2.3.7 A6: Exit Diagnostor .......................................................................................... 16
2.3.8 Other Cases & Function ................................................................................... 17
2.3.9 Sample Story-Board & Class Diagram ............................................................. 17
2.4 Nature of the Underlying Data .............................................................................. 18
2.5 Capturing the Expert Insight .................................................................................. 20
2.6 Initial & Established Brief ..................................................................................... 21
3 Specification & Solution ............................................................................................... 22
3.1 Requirements ......................................................................................................... 22
3.1.1 Notes on Long-term Requirements................................................................... 22
3.1.2 Requirements of the Project ............................................................................. 23
3.2 Assumptions .......................................................................................................... 24
-v-
3.3 Data Cleaning & Manipulation.............................................................................. 25
3.4 Review of Algorithms ............................................................................................ 26
3.4.1 The Diagnosis Algorithm ................................................................................. 26
3.4.2 Recommendation Algorithm............................................................................. 26
3.4.3 Recommendations with Cost Algorithm........................................................... 27
4 Implementation & Function .......................................................................................... 28
4.1 Solution Achieved: A Working Core...................................................................... 28
4.2 Walk-Through: Use of the Mk0 Diagnostor .......................................................... 29
4.3 Towards Developer Testing ................................................................................... 34
4.3.1 Extremes & Boundary Testing.......................................................................... 34
4.3.2 Notable Errors & Malfunctions ........................................................................ 35
5 Evaluation...................................................................................................................... 36
5.1 Critical Review of the Mk0 Diagnostor................................................................. 36
5.2 What was not achieved .......................................................................................... 36
5.3 Deployment Solution ............................................................................................. 37
5.4 Reflection on User Feedback ................................................................................. 37
6 Conclusion ..................................................................................................................... 39
6.1 Approach ............................................................................................................... 39
6.2 Deployment ........................................................................................................... 39
6.3 Future Work .......................................................................................................... 39
6.3.1 Expanding the Diagnostor ................................................................................ 40
6.3.1.1 Incorporation of Costs into the Recommendation Algorithm. ................. 40
6.3.1.2 Reading/Writing to a free-standing Knowledge Base............................... 40
6.3.1.3 Proper representation of all test options (beyond positive/negative). ....... 40
6.3.1.4 Non-linear entry of test inputs. ................................................................. 40
6.3.1.5 Controlling Level of Detail ....................................................................... 41
6.3.1.6 Development of the Diagnostor by other means....................................... 41
6.3.2 Applying IT solutions to the Institute’s knowledge resources .......................... 41
6.4 Concluding Remarks ............................................................................................. 42
References .......................................................................................................................... 43
Bibliography ....................................................................................................................... 44
Appendix A: Installation Guide .......................................................................................... 45
On Maintenance.............................................................................................................. 45
Appendix B: User Introduction .......................................................................................... 46
Under the Bonnet ............................................................................................................ 46
Appendix C: User Guide .................................................................................................... 47
Appendix D: Legal Note – Institute of Aquaculture & Data Protection............................. 48
- vi -
Appendix E: Questionnaire ................................................................................................ 49
Appendix F: Questionnaire Responses ............................................................................... 50
- vii -
List of Figures
Figure 1.
Sample Knowledge Base ...................................................................................... 2
Figure 2.
Example Test Results ............................................................................................ 2
Figure 3.
Matching for culture C1........................................................................................ 3
Figure 4.
Matching for culture C2........................................................................................ 3
Figure 5.
apiweb glimpse #1 [4] ........................................................................................ 10
Figure 6.
Demonstration input screen from apiweb [4] ..................................................... 11
Figure 7.
Demonstration output of apiweb [4] ................................................................... 11
Figure 8.
Extract from: F psychrophilum biochemical database.xls .................................. 12
Figure 9.
Sample of the embedded Knowledge Base......................................................... 12
Figure 10.
Back from the Drawing Board ........................................................................ 14
Figure 11.
Sample Storyboards ........................................................................................ 17
Figure 12.
Proposed Class Diagram ................................................................................. 18
Figure 13.
Figure 1Class Diagram Solution ..................................................................... 22
Figure 14.
Initial GUI View ............................................................................................. 29
Figure 15.
Seven Inputs Entered ...................................................................................... 30
Figure 16.
All 26th in!...................................................................................................... 31
Figure 17.
Diagnosis and Recommendations ................................................................... 32
Figure 18.
Figure 2 Raw KB ............................................................................................ 32
Figure 19.
No Matches ..................................................................................................... 33
Figure 20.
Singular Match................................................................................................ 33
Figure 21.
The Trivial Case .............................................................................................. 35
- viii -
1 Introduction
Herein serves as a review of the problem, established and informed by discussion and feedback with the Institute of Aquaculture.
Scope and Objectives
1.1
From the outset, four particular objectives have been in mind with regards to the diagnosis
package, which is henceforth referred to as the Diagnostor.
These objectives are outlined as follows:
I.
Use inputs supplied by user to achieve diagnosis
II. Maintain and add to the knowledge base
III. Functional user interface
IV. Advise further tests to narrow diagnosis
Two additional functionalities had been originally considered but were largely dropped as
core concerns:
V. Implement Administrator and Inquirer user-types.
VI. Alter the accuracy of any given diagnosis.
To give these four principles some context, it is sensible to first consider the mechanical
aspect of diagnosis itself, as conducted by the Institute.
Consider that Figure 1 represents a condensed ‘knowledge base’ held by the Institute. Its
rows identify a given culture, its columns correspond to biochemical (and other) tests conducted by the Institute on each culture. Thus each individual cell marked by a test and a culture
corresponds to the result of a specific test on a specific culture1.
1
Strictly speaking, the complexity of the KB supplies by the institute is almost exactly this. The column
‘Culture’ is subdivided to the Genus name, Flavobacterium psychrophilum, whilst also being
appended with a strain ID. There is some nuance involved here but it is covered later when
discussing Data Cleaning & Manipulation.
-1-
Figure 1. Sample Knowledge Base
The Institute will then receive samples2 and begin the process of conducting their suite of
regular tests on the sample. There are six tests, known as primary tests, that are conducted to
identify the genus of the bacteria sample considered, these are conducted in advance of all
other (e.g. biochemical, molecular) tests3. Further to this, the suite considered originally is
composed of twenty further tests, meaning twenty-six tests in total are utilised in the analysis.
The tests results for such samples can be readily encapsulated as in Figure 2:
Figure 2. Example Test Results
So, for a given sample, say, the culture C1, it is seen that only tests 1, 2 and 3 have been
conducted. The remainder are as-yet unknown4. Similarly, culture C2 has only tests 1,2,6 and
7, with C3 having tests 2-5. By visual inspection it is relatively easy to check these samples
against the KB. Allowing that “/?” can mean a test can be either “+” or ”–“, Figures 4 and 5
plainly demonstrate the ‘match’ mechanism visually for cultures C1 and C2. ( Examining the
2
It is rare that the Institute receives actual test results directly from a customer; for the most part
actual, living samples are collected and tested.
3
These tests are, Gram, Shape, Oxidase, Motility and Fermentation. For Gram and Oxidase, they have
positive/negative result forms, whereas those remaining have their values being more elaborate
still, e.g. shape is, for Flavobacterium psychrophilum,
4
As the institute typically conducts the entire suit of tests, this aspect is introduced somewhat
arbitrarily, to allow for the case that only some tests are conducted, even if this is not common or
would represent only a transitory state in the Institute’s information. (E.g. if still waiting for the
other test results to be obtained.)
-2-
KB for C3 you will see that its supplied tests do not wholly match anything in the sample KB
seen in Figure 1.)
Figure 3. Matching for culture C1
Figure 4. Matching for culture C2
Very briefly: C1 matches a singular culture in the KB, Culture A. C2, however, could be
culture A or culture B: more tests must be conducted to discover which it is!5
In essence, however, it is this simple ‘by eye’ manual inspection by which the academics of
the Institute have been making their diagnoses. This, plainly, is a somewhat trivial and tedious
exercise in pattern matching: humans can easily make mistakes or, worse, get bored and distracted. Better, surely, to utilise everyone’s mechanical silicon-based friends whose innate
abilities are much better suited to such an endeavour.
To that end the primary objectives are then clearly:
I. Obtain inputs from the user to obtain a diagnosis to one of three possibilities:
5
i.
Single match, as for sample culture C1
ii.
Multiple matches, as for sample culture C2
It is expedient here to note that in the case of Culture C2, distinguishing whether it matches to A or to
B (or to nothing our KB knows of) offers the options of tests 3, 4 and 5. However, the result for test
4 and 5 are the same: - for A, + for B, this is a minor aspect of a much larger problem in concocting
the recommended tests given by the Diagnostor: several tests may offer the same information, so
only one of those ‘equivalent’ tests need be conducted to narrow within the KB. (Note, trivially, for
only two possibilities, only one test is needed to be conducted. A hypothetical test 8 could be
considered which may have, like tests 1 & 2, + and + for both A and B: conducting this mystery test
8 would be fruitless for the endeavour!)
-3-
iii.
No matches in the knowledge base, as for sample C3.
II. Maintain and add to the knowledge base allowing academics to keep their records up
to date.6
III. Functional user interface to speedily and intelligibly enact user desires and convey
results.
IV. Advise further tests to narrow diagnosis in the case I.ii corresponding to culture C2.
1.2
Background to Diagnosis
Now that the problem is set, it is worth establishing the real-world context and need underpinning this project.
The Institute of Aquaculture offers a commercial diagnostic consultancy service worldwide[1]. They offer a wide range of services, notably in taking aquatic samples, typically fish,
on behalf of their clients and in doing so apply a 'suite' of twenty standard tests. The sampling
is typically enacted to check for the presence (and then diagnosis) of specific bacteria strains7.
There are six particular tests whose results will, for the bulk of the Institute's work, constrain
the given sample to a specific, particular genus. Within this genus, the aforementioned suite of
twenty tests can then distinguish to a particular strain.
The Institute conducts these tests themselves on campus. Once the results are collected the
academic in question will then 'compute' the results by hand: literally observing the tabled results and checking them off one by one. This can be considered a tedious job, one which is
readily handled by information technology8, hence the request for computing insight into resolving the issue.
6
It should be re stated that this objective was not achieved in the project and is therefore only given
cursory treatment and exploration.
7
As noted in 1.1, the Institute identifies bacteria cultures first by six Primary Tests, which reliably
identify the given Genus, whilst then conducting a suite of secondary tests which aide in the
pinpointing of which Strain is being handled. This hinges on the scientific classification of organisms
in biology. By the International Code of Zoological Nomenclature[2] there are seven nested classes
here listed in increasing specificity: Kingdom, phylum/division, class, order, family, genus, species.
In this manner, when Strain is discussed, it of course refers to a sub-member of a species of
bacteria.
8
Pattern matching is well covered in IT. Most users will be intimately familiar with crtl+f ‘Find’ searches
within a document; the what is desired within this project, by encapsulating that which the
academics already do (but faster), is to extract information from a given KB and set of inputs and
combine these sources in such a way as to readily facilitate the application of whichever pattern
matching method is most desired. Though, of course, human cognitive ability in pattern matching is
still rather readily more advanced/complex, as the still efficacious CAPTCHA tests show, such
advanced aspects are not utilised in the task at hand.
-4-
Moreover, it is noted that this diagnostic service is a commercial endeavour. As with most
such endeavours, there is a manifest interest in reducing costs and efficiently using resources.
It is doubtless the case that the Institute's academics are a valuable resource and, therefore, so
is their time. If a package could be developed which would minimise the time an academic
spends on any given consulting diagnosis, then there is a clear benefit to be reaped.
There are further benefits relating to this focus, prominently: the potential to reduce the
costs as offered to the clients: if the process of diagnostic consultancy consumes less of the
Institute's resources, it is possible that the cost of offered services may be reduced. Alternatively, or indeed additionally, there is the likelihood of increased throughput on the part of the
Institute. It would require the time taken to compute the diagnosis to be a significant concern,
of course, which is not strictly given, but if the specific details of the proposed Diagnostor
bear fruit (especially the ability to recommend only the minimal further tests necessary to
achieve a diagnosis within a KB), then it is not at all inconceivable that this could improve
efficiency overall.
The Institute of Aquaculture has collected a large and comprehensive bacteriological
knowledge base which constitutes a very desirable resource. As will be discussed in Section
2.1, it is the case that modern commercial diagnostic packages are both expensive and illsuited to the Institute’s purpose: they have their own knowledge base and their own routine for
managing the consulting of it. As this is the case, the Institute has a clear interest in trimming
time spent by its academics in unnecessarily computing diagnoses when there are other approaches which may be both quicker and, potentially, more reliable9.
Through discussion with the Institute, it is plain that there are several primary factors involved in the diagnosis process. The first and most involved is the actual cultivating of
cultures, the application of the tests in correct manner and the collection of the results. This
'labour time' is an accepted part of the process and is not something that is within the remit of
the project. Nevertheless, arbitrary data has been formed to illustrate how simply connecting
the knowledge of the time spent on each individual test could be incorporated into the Diagnostor and factored into some of the decisions which, without it, may be made on other
information or, indeed, wholly arbitrarily (see section 3.3: Data Cleaning and Manipulation).
9
A major feature of the early development of the Diagnostor was musing on the nature of the
knowledge base and how it was maintained by the Institute. Though the Diagnostor fails to
accommodate the maintenance of the KB, that remains an outstanding IT problem for further work
in this field: the KB must be managed in a sensible manner with fairly rigorous standards applied
throughout so as to encourage capitalisation and exploitation of the information resources they
have assembled for themselves.
-5-
Concluding that consideration, there are two other factors concerned with the process. The
first follows tangentially to the first's labour costs: the delay involved. Cultivating cultures to a
suitable point where test results can be achieved is likely an involved process. Though typically the established suite of packages is conducted as routine by the Institute, it is not
unthinkable that, given some redundancy in which tests are needed, corners could be rightfully
cut if the redundancy is not desired. Though it is repeated elsewhere, it is best to be secure on
this point:
Consider two hypothetical tests, Alpha-Ted and Beta-Dougal. If, having conducted
and diagnosed for several preceding tests, the knowledge base allows it to be diagnosed to a selection of, say, four cultures, these two tests remain to be conducted. But,
both Alpha-Ted and Beta-Dougal, make the same predictions! E.g. if a particular culture is indeed to be diagnosed, both Alpha-Ted and Beta-Dougal will give the same
result when conducted, regardless of which culture is actually present.
Or, more bluntly, both Alpha-Ted and Beta-Dougal will allow the same level of discrimination between the four cultures. (Unless, trivially, they eliminate: none of the
four cultures match the input, meaning the input doesn't match our existing
knowledge.) Only one is needed to ‘best’ distinguish between the four cultures.
This problem and proposed solutions are discussed in more detail later, but it serves to illustrate a pressing point with regards to the diagnosis process: though the tests Alpha-Ted and
Beta-Dougal remain to be tested, they are not both required to arrive at the best answer the
knowledge base could yield. That is the central issue arising from the redundancy of tests:
once some new knowledge is obtained about a client's sample, the already-assembled
knowledge base can be seen to allow improved efficiency.
A final concern raised by the diagnosis process is, essentially, an ethical one. Diagnosis of
bacteria, whilst so far regarded as a scientific endeavour with some commercial ramifications,
is highly important to the well-being of the ecosystems where these bacteria dwell. Though
commonly the Institute deals with farmed-fish populations (certainly that is the relevance of
the base this project worked with primarily: Flavobacterium psychrophilum), where diseasecausing bacteria can cause havoc, there is a wider health interest beyond the immediate wellbeing of the fish; notably those dependent on it. Austin and Austin, 1993 support the serious
repercussions of this view:
-6-
“It is apparent that most attention has been devoted to diseases of farmed fish species.
Perhaps not unnaturally the reasons reflect the high value of the stock, and he serious economic importance of losses attributable to bacterial fish pathogens” [3]
Though it hardly needs stating: the efficient, concise and reliable diagnosis of potentially
harmful bacteria (and the attendant knowledge gained in general) will surely go some way to
improving the quality of life of all those concerned; the web of concern spreads, even if at
least by gossamer tangibility.
1.3
Technical Needs
The package utilised previously by the Institute is the clinical diagnostic apiweb product
which was offered by bioMériux[4]. The Institute has long since disavowed itself of this
service, notably because the interface was purportedly clunky, unclear and generally offering
too many options beyond their own concern. That said, it suffices as a typical complaint:
existing clinical diagnostic tools are not suited to the Institute's particular needs.
1.4
Technologies and Relevance
As best can be surmised, the apiweb interface is precisely that, a web-hosted interface
which allows communion with, perhaps, a PHP script which in turn consults the relevant database sustained at server-side, or likely separate from the client. Though a web-hosted
mechanism was considered (and indeed is still entirely viable: the diagnostic algorithms can
surely be translated to similar functioning SQL queries), the decision to focus on a user’s desktop application seemed most sensible.
The choice of programming languages to be considered was quickly narrowed to Java and
Python: Java for simple expedience and familiarity, Python for its reputed strength in handling
and manipulating high-level data structures. Though progress was made in learning Python,
some setbacks and delays, notably in obtaining the relevant data, meant that progress was
largely made unguided and somewhat unfocussed whilst other aspects of the project were being established.
Both Java and Python offer ready portability, Java was held to as being most widely accessible and the most convenient to begin with. With elaboration (see sections 2 & 3) made on the
potential design ‘in the long run’, it was then plain that initial progress on a Diagnostor to provide to the Institute would have to constitute a ‘proof of concept’ or ‘core’ to any larger project
tool. Design considerations supported this, leading to the somewhat poorly organised three
classes assembled which constitute the Diagnostor and effective KB.
As this is thought to be a core module, a further strength in the selection of Java is thought
to be the relative neatness with which prior-familiarity would allow code to be prepared. In the
-7-
case that the Diagnostor proves useful and attractive, which is indeed what is hoped, further
development from this basic package should be easily facilitated.
Tangentially, it is also worth noting the rejection of other more potent mechanisms for
matching and analysis: as the method already used for diagnoses by the Institute is so simple,
proper exploration of, say, case based learning or more detailed techniques, such as composing
the Institute’s KB into something upon which data mining techniques may be applied were
largely dismissed early on as being particularly fanciful given the relative simplicity of the
problem. Though they are potent and applied to more general diagnosis problems, it was clear
that capturing the expert’s own method would be the primary concern; progression beyond that
could wait.
1.5
A View of Tools & Technology
Broadly, only the Java Virtual Machine is particularly required to engage the Diagnostor. It
was developed primarily in Eclipse, though BlueJ was used on odd occasion. As the KB was,
unfortunately, hardwritten into the code of the Diagnostor’s KBBoundary class, there is evidence of the facility in the code, allowing any who follow in these footsteps to see the
preparations made for allowing the KB to be read, say, from a .csv file. As there was (what is
thought to be a minor) hiccough in the road to establishing the file reader and writer aspects,
the Diagnostor as it stands is very much a stand-alone package.
Of the potential complexities afoot, it was found that most of the academic hurdles could
indeed be solved with effectively simple IT concepts: single- and two-dimensional arrays, integer, string and Boolean data types, nesting of arrays and the correct arrangement of
parameters. Though this leads to some unwieldy or inelegant code in places, it is nonetheless
effective and, with application of good software development practices in indentation and sensible naming conventions and so forth10, it is hoped to be readily extended. The text “Data
Structures with Java” was very widely consulted in this regard[5].
1.6
Project Summary
With the problem of diagnosis of aquatic bacteria established, the aim of a maintainable KB
accessed through a GUI, which is queried and offering advice for further tests based on the
knowledge (and input) given having been presented, the Diagnostor itself falls under inspection.
10
It is mused that the most perplexing of the names are those of the objects afoot in the Diagnostor.
The instance of the Archive class we’re concerned with is denoted “biscuityKnowledge”. This stems
from Swede Mason’s “Masterchef Synesthesia”[6] which interfered with progress from time to
time, any time the word ‘base’ was considered. That is: that buttery biscuit base.
-8-
The elements of the KB are held as identified Strains with fully complete tests listed for
them. Further to this each strain has a name (in this case “Flavobacterium Psychrophilum”)
and a Strain ID. The KB supplied by the institute has been padded with hypothetical (and arbitrary) cost values, notably for Financial, Delay and Labour concerns. The test ‘columns’ are
headed by their test name and have each one of the three respective costs as well as a result for
that test corresponding to each strain held in the KB. The KB is held directly in the KBBoundary.java file as the maintainable aspect of the Diagnostor has not been achieved.
The approach taken is to accurately discern and model the process of diagnosis conducted
by the Institute’s academics and produce a GUI which should greatly speed up that process.
As such, inputs can be taken in the form “?”, “+” or “-“ relating to whether a given test is
unknown/undeclared, positive or negative. For the twenty-six tests (six primary, twenty biochemical/molecular) an input is declared and diagnosis is conducted. The GUI is then updated
to correspond to the three possibilities: single match, multiple match or no match. This has
been achieved.
The algorithm for determining which recommendations are to be given transpired to be a
non-trivial problem in optimisation and, as such, the progress which has been made, indeed
discerning the precise problem and the computing complexity required to achieve it is viewed
as a particular (albeit partial) success of the Diagnostor.
It is delivered as a Java application, ran in suitable environment such as Eclipse. Design
notes and consideration accounting for the form, nature and decisions made will be covered in
more technical depth in the remainder of this report.
The Diagnostor is far from a complete package suitable for ready exploitation by the Institute, but it stands as a ready package, a genuine proof of concept that with further development
and consideration it could be expanded to prove a functional and viable tool. In essence, it is a
satisfactory demonstration of the efficacy of the approach, its nature and realisations readily
reapplied to alternative deployments as necessary or desired.
-9-
2 Background & Foundations
It is noted that existing tools are either ill-suited or not easily reapplied to the problems faced
by the Institute for Aquaculture.
2.1
Existing Products and Contemporary Work
The Institute for Aquaculture’s concern is, unsurprisingly, aquaculture. Though many bacteria
are flung widely in nature, being essentially ever-present, the established packages are tailored
too closely to mammalian and, more specifically, human bacterial strains. Though they can
provide some information, their tests are conducted in, for aquaculture purposes, overwhelmingly the wrong environment. Temperatures and pressures in water will be distinctly separate,
it is asserted by the Institute, that many bacteria give widely different responses when subject
to different environments; in effect the tests conducted and the knowledge assembled is not
widely useful in its reapplication to the aquaculture environments of concern to the Institute.
Figure 5. apiweb glimpse #1 [4]
As can be seen in Figure 5, this product the apiweb, which the Institute had previously used
is operating on a largely different level and accessing a differently formatted KB: the dimensionality and variation needed to cluster profiles as in Fig. 5 is indeed significantly more
complex than needs discussing, apiweb is thus already offering a complexity which is wholly
- 10 -
unnecessary. (This is compounded by the database used being likely invalid for the aquaculture environment.)
Figure 6. Demonstration input screen from apiweb[4]
Figure 7. Demonstration output of apiweb[4]
- 11 -
Figures 6 and 7 give an indication to the nature of the process involved in using apiweb:
use by a non-specialist is not obvious given the problem already outlined. Though it is shakily
inferred, it does support the idea related by the Institute that the product was not exactly wellsuited to their purposes.
2.2
Data Structures & Non-trivial Optimisation
It is worth considering the manner in which the KB was to be stored and utilised within the
Diagnostor. Though supplied as an Excel-prepared database of the form:
Figure 8. Extract from: F psychrophilum biochemical database.xls
Enzymes
Culture
recoded id
Flavobacterium psychrophilum
1
Flavobacterium psychrophilum
1a
Flavobacterium psychrophilum
1b
Control
neg
alk
pos
C4
var
C8
var
-
+
+
+
+
+
+
+
+
+
It was hoped that, by converting to .csv format the file could be read directly. As has been
noted before, this was not achieved within the project and instead it was hardwired as a twodimensional String data type array (albeit padded with some extra information, see Section 3.3
for further information):
Figure 9. Sample of the embedded Knowledge Base
Care was taken to ensure that the kbProxy was dimensionally correct, that the length of all
its rows and all its columns were consistent (as variation in these, though permitted in Java,
would require detailed fine-tuning in the management of sizes as they progress through the
Diagnostor). This kbProxy is held in the KBBoundary class, where it would be intuitively expected that any KB being read from outside the Diagnostor itself would enter into it.
As the KB is now principally encoded as an NxM array, loosely corresponding to N cultures and M tests (strictly these are slightly larger, as the costs, names, IDs and so forth must
be padded in suitable places). However, the immediate intent was then to vivisect the kbProxy
and translate it into several smaller, ostensibly relational, knowledge bases:

Names corresponding to the tests.
26-long 1D array
- 12 -
String

Names corresponding to the cultures.
74-long 1D array
String

IDs corresponding to the cultures.
74-long 1D array
String

Each of the costs for the tests.
3x26 2D array
Integer

The cultures and their test results.
74*26 2D array
Boolean11

Non-rectangular options for tests
26*variable array12
String?
As the data structure discussed is primarily arrays, this raises some other considerations.
Foremost is the manner of organisation and managing the associations of these arrays. It is
clearly seen that the two main ‘keys’ for the knowledge base would, discursively, be ‘Tests’
and ‘Cultures’. Provided the indices for these are tracked and that the padding/management of
the arrays is adhered to consistently (and is not so esoteric as to be unmanageable by anyone
else who observes the inner workings of the Diagnostor; that they are managed as intuitively
as possible), then these structures will be well managed.
Similarly, when it came to considering a GUI, use of the names corresponding to tests to be
input, the names of cultures (when reporting diagnosis) and so forth was particularly useful:
again, allowing for consistency and intelligible inference based on consistent use of indices
was well rewarded.
A second consideration on the point of arrays is that this effectively disallows some potentially more intriguing data structures.
As the method of diagnosis is effectively similar to classification, the temptation was there
implement a class which itself not only analyses the KB, but essentially dissects it and uses the
components to form wholly new data structures, such as a manner of decision tree which
might be applied to the entire dataset. In that way, any queries input by the user might take the
form of progression through a tree rather than being forced to input (or skip by use of “?”)
each and every test. Close analysis of the KB given clearly shows that many of the tests are
11
This is enforced by choice in this instance of the Diagnostor. In the KBBoundary class the method
actually used in our solution translates from textual input, including non-binary values like “l” (rod
shaped) or “pig yellow” (a culture’s appearance at 15⁰C). As the entries in the source KB are
uniform, this core process of the Diagnostor simply treats these values as “+”, or, strictly: True.
(Where “-“ corresponds to False.)
12
This is never implemented in the Diagnostor: the fifth ‘row’ of the array kbProxy is assiduously
avoided. A mechanism of splitting each element was entertained, as noted by the internal separator
‘;’, e.g. as “+,-“, “l;o”, “WG;?”, thus allowing the potential options to be ‘read’ from the inbound KB.
This is of particular use in the Primary Tests whose values are not always simply positive/negative,
but which can be quite varied. Standardising for, or at very least comprehending the nuance and
working around these and incorporating them is a surely important, but was not explored in this
instance of the Diagnostor.
- 13 -
not entirely relevant once you have some matches established, as such the KB does lend itself
towards this perhaps innovative system.
As the project’s KB has been formed from only one of the Institute’s data sets, that is the
database “F psychrophilum biochemical database.xls” supplied, the decision was taken early
on to stick to the ‘crude’ method of manipulating arrays; the potential nuance of further sets
was not known and, as had additionally been committed to early on, the intent was to first replicate and automate the existing method conducted by the academics. As such, tree approaches
(and even the use of lists rather than arrays) were largely dismissed, mainly because it was
expected that brute force and arrays, simple tools, would be quite sufficient for the task.
This optimism confidence was, perhaps, misplaced. It became apparent rather quickly that
the idea of determining the set of ‘recommended tests’ needed to narrow a multiple-match result would be a non-trivial optimisation problem. Though, for effectively small numbers and
low dimensions, the problem is still managed by brute-force, it is not exactly efficient. Or aesthetically pleasing, as Figure 10 indicates.
Figure 10.
Back from the Drawing Board
Figure 10 relates a brief glimpse of the method of determining recommendations. It hinges
on narrowing the knowledge base: knowing that, once some tests are input, those aspects can
be neglected and that, having made a diagnosis to more than several cultures (S1,S5, S7 & S8
in Fig.10), the possibilities for remaining unconducted tests can be used to inform progress.
- 14 -
That said, in such cases some tests will not be useful, the possible answers, if conducted, could
be fruitless, they could all be “+” (as the case for Test 3 when compared against S1,S5,S7 &
S8). The steps involved are conceptually a tad difficult, but already strongly hint that only a
vague familiarity with discrete mathematics could be crippling or, at best, embarrassing in
formalising this aspect of the problem...
First the ‘profile’ for each remaining test (the results for each diagnosed culture) must be
determined, then it must be analysed for homogeneity. If it is not homogeneous then it should
be grouped (if possible) with other tests...because some tests might yield the same possible
answers (and thus be said to be “equally discriminating” for the cultures we’re still considering to narrow down)! And, once they are grouped together, only one need be selected from
each group for the recommendation.
But, before conducting the analysis, the number of groups of equally discriminating tests
and the number of discriminating tests themselves (if any) are not readily known.13
2.3
Proposed Implementation
Though a hope was made early on in conceptual development to allow for two user types, an
Administrator (who could manipulate the KB as well as consult the Diagnostor) and an Inquirer (who could only consult the Diagnostor, without executive ability to alter the KB), the
inability to manage direct access of a stand-alone KB separate from the Diagnostor itself
meant this was swiftly relegated from concern. As such, the Actor for conventional purposes is
now a generic Inquirer, someone who has the ability to access the Diagnostor’s functions via
the graphic user interface, but who is effectively separated from the KB.
It is then viable to consider the possible use cases of such an Inquirer.
Actor: Inquirer
2.3.1
A0: Activate Diagnostor
The Inquirer is able to commence the running of the program, then to be allowed access to the
initial GUI and further options.
13
There are alluring corollaries here: that the possible groups can never be greater than the number of
tests involved (and even at most, equal numbers, each group would contain only one test), also that
homogeneous tests can be safely discarded – either they will support the diagnosis without
narrowing, or they will invalidate the diagnosis and say that the KB has no knowledge of such a
strain.
- 15 -
2.3.2
A1: Enter Test Input
The Inquirer, for most stages of the GUI except when all inputs have been assembled (e.g. after inputting the final test value), will have the option to select a suitable input value for a test
(which will be sequentially entered). By pressing the button, they will communicate their
choice to the GUI which will store it properly. At each stage a test input is added to set I[t]
(inputs with each value corresponding to a specific test).
2.3.3
A2: Move Back
Except in the very first state, the Inquirer will have the ability to ‘move back one step’ in the
entry of data via the GUI. This will prompt the correct update of the GUI.
2.3.4
A3: Reset
At any stage, the Inquirer may reset the GUI and the Diagnostor to its initial state as if no inputs had been entered and, if diagnosis has been conducted, as if none had been undertaken.
2.3.5
A4: Conduct Diagnosis
The Inquirer, having assembled a complete set of choices for set I[t], effects a diagnosis which
submits this set to the Diagnostor’s innards which consult the KB, narrow and returns the set
M, which relates the result of the diagnosis14 along with any recommendations.
2.3.6
A5: Select a Cost Preference (unimplemented)
Prior to the conduction of a diagnosis, the Inquirer may select which of three costs (financial,
labour or delay) is of concern, this will influence the manner in which any recommendations
are selected for return to the Inquirer via the updated GUI.
2.3.7
A6: Exit Diagnostor
Simple functionality allows the Inquirer to close the Diagnostor interface. In a more extended
version, this exit might prompt a warning of, say, partially input cultures to be added to the
KB. As such functionality was not explored, in the end, the exit use-case is simply to end the
Inquirer’s interface with the Diagnostor.
14
As before, this will be one of three possibilities: single match, multiple match or no match.
- 16 -
2.3.8
Other Cases & Function
As consideration was given to other possible implementations and functions which remain
desirable features in a more developed Diagnostor, there are other use cases which can be noted for posterity:
a) Begin the introduction of a new culture to the KB
b) Select a new value as part of a new culture being added
c) ‘Writing’/saving a new strain’s data to the KB
d) Discard an in-progress entry (see discussion in ‘Exit Diagnostor’ case)
e) Begin the introduction of a new test to the KB, etc.
2.3.9
Sample Story-Board & Class Diagram
Early in the design process, some effort was made to anticipate a long-term view of the Diagnostor project. The pertinent surviving records of that era of development demonstrate the
conceptualisation of a more nuanced GUI, involving interaction more common to a nice website or well adapted Flash presentation, as crudely seen in Fig. 11; as the GUI was not deemed
an overwhelmingly pressing aspect of this project, development on that front flagged significantly, though the concept still remains viable for further expansion (especially in light of test
users’ feedback).
Figure 11.
Sample Storyboards
In terms of structure, it presents some significant departures from the actually implemented
version, but it does inform the vision underpinning the Diagnostor with aims beyond Mk0.
Another holdover from before implementation is the early Class Diagram proposals. Once
decision was taken to focus on making the Diagnostor only the ‘core’ of a wider system, adherence to Class Diagram and sequence diagrams stemming from the prior outlined Use Cases
- 17 -
significantly dropped. Nevertheless, the ethos underpinning UML was held firmly in mind,
albeit making for very conscious awareness of how poorly jointed the final implementation
actually became. Nevertheless, as it elaborates the design process and is useful in comparing
the initial ideas to those that held through development, we now see in Fig 12 one of the earliest yet detailed class diagrams, poorly noted as it
is15.
Figure 12.
Proposed Class Diagram
Significantly with Fig. 12, it is worth noting that some of the functions are indeed still present in the Mk0 Diagnostor. The user boundary and GUI have been fused into the Primer.java
class, whilst the bulk of information exhibited by the Data Controller and Algorithm classes
are actually contained in the Archive.java class. The KBBoundary held over as expected,
though with a more elaborate format which is notably more thorough than that seen here.
2.4
Nature of the Underlying Data
The given biochemical tests all have the form, in the original data set, of “+” or “–“ entries.
The arbitrarily added Primary Test data (-, l , - etc) begins in a less helpful way but is forced to
fit to the “+”/”-“ standard for our purposes. This, plainly, (and as seen in the view of the api15
A key lesson surely obvious to most is that if a significant scribble is hastily made on a busy train, it
should at first opportunity be updated and formalised; not left as an untidy future reference.
- 18 -
web tool) is not necessarily the whole story. Biochemical test data may take the form of, say,
visual inspection of applied solutions for colour change or any variety of particular tests. It is
not difficult to imagine many variety methods of test output, both qualitative (e.g. colour
change, information by inspection) or quantitative (measure of mass, molarity etc).With this in
mind it is plain that the database supplied is not likely the whole story. Indeed, beyond the
scope of the Diagnostor, but reflecting part of the use to which it is expected to be applied,
there is some cause for concern on the validity of casually associating the presence of an organism to any given conclusion on cause of disease:
“A question mark hangs over the significance of some organisms to fish pathology – are
they truly pathogens or chance contaminants”[7]
Whilst it is prudent not to overstate any importance to the Diagnostor’s ability to match input to the KB, it is worth a moment’s speculation as to what, computationally, could be done
with more fulsome raw data and better understanding of the cogitative problems facing the
field of aquatic biology. Certainly, as late as 1993, authors note:
“The ubiquity of bacteria in the aquatic environment where they play a major role in
both synthetic and degradative processes, makes the task of the fish bacteriologist far from
straight forward. The lack of more than a vestigial taxonomic framework, leading to very
incomplete understanding of the relationships between the various groups associated with
fish diseases or spoilage, makes the logical study or classification problematical. A full understanding of cultural requirements, biochemical properties and antigenic and genetic
characteristics is being developed only gradually” [8]
Within the last two decades,, at least, it seems the state of their art is indeed hampered. In
collusion with what has been said regarding diagnostic tools available to the institute, it would
appear that aquatic bacteriology is lagging in comparison to its more glamorous peers; human
and mammalian bacteriology.
Considering that conjecture, briefly, with a knowledge base such as the Institute possess,
ready and able for worldwide consultancy, there is possibility that the data underlying the set
on which this Diagnostor has already been prepared, may yield more interesting and potentially more powerful classification mechanisms.16 Of course, it is also entirely possible that the
16
It was considered during development that a mechanism capable of handling only partial matches
may prove to be a useful research tool. Though only receiving a tepid response when proposed,
mainly due to its speculative nature, it is noted for posterity. By selecting, say, a k-value for a
diagnosis, a threshold to which the diagnosis must adhere accurately. E.g. if a k-value of 0.2 is
selected, then only one in five of the supplied tests need be matched accurately. This could be
immensely less efficient. In discussions with the Institute it was touched upon the high accuracy
typically required only uncommonly deviates significantly lower than, say 98% accuracy.
Implementing some variation over very high accuracy and, perhaps, along with the use of ‘raw’
- 19 -
data possessed is insufficient or that the information systems methods mused on would be inappropriate to the task.
As it is known that the data is well divided such that Genus can readily be established from
the Primary tests: Gram, Shape, Oxidase, Motility, Fermentation and the inspection of appearance on the selective agar the culture is cultivated upon (at 15⁰C), it is already possible to
assert a stratification to the diagnosis process. Though the Diagnostor does not reflect this, it is
something that has been consciously remembered throughout development and study: the first
six tests (strictly, the first seven including the ‘control’ test, which is always negative in the
present Diagnsotor’s case) all give specific, unchanging results for F. Psychrophilum.17
2.5
Capturing the Expert Insight
The nature of the method for diagnosis was largely determined quite quickly. As has been described, the knowledge base takes the form of positive and negative values for some
biochemical tests, with more complex results allowed for the Primary tests.
Once an academic in the institute is in possession of a set of test results, it is then their lot
to, by hand, begin matching off the results against those in the tables. For F. Psychrophilum it
is such that there are only several dozen cultures and twenty individual biochemical tests. It is
inferred that, as this is only one genus’ worth of bacterial strains, that there is a measure of
‘narrowing it down’ before even this process of ‘consulting the tables’ was begun.
Human concentration and tolerance for tedious or fine tasks is not infinite and it is easily
imagined that the academic’s mental prowess might well be spent elsewhere when a well prepared machine may conduct it exceedingly quickly. This was the primary drive to this:
academics are capable of doing this, but there is no real expertise involved beyond first preparing and arranging the tables; all else is tedium. The prospect of deeper analysis of the data
was only introduced by the author, though ultimately set aside in favour of development of the
Diagnostor as remains to be seen.
data (numeric or qualitative) rather than +/- values may allow such an endeavour to be better
explored. Unfortunately as it is not immediately obviously a productive avenue, it was, again,
largely dismissed.
17
Those results are: Gram – Negative; Shape – Long filamentous rods; Oxidase – positive; Motility –
weakly gliding; Fermentation – oxidation; Appearance on selective agar at 15*C – as yellow
pigmented bacteria. Additionally there is the control which is always negative. When input into the
Diagnostor, this run of six results (plus control, for seven) for F. Psychrophilum would be as “-, +, +,
+, +, +, -“.
- 20 -
2.6
Initial & Established Brief
After elucidation of the problem through contact with the Institute, the path and immediate
remit of the project as easily determined:

To construct an information system which will act upon a knowledge base (ideally accessing and maintaining it, though not yet realised) through which diagnostic consultancy will
be performed by Institute academics.

Primary function is to enact accurate diagnosis with rapidity.

Diagnosis is to, at least initially, take form as matching that practiced by academics.

A desktop application ultimately to be held within (a few) machines in the institute.

As other clinical diagnostic software is available (but ill suited) there exists potential longterm for commercial use: The package should be prepared with mind to potential further
extension beyond whatever is achieved in the short term.
- 21 -
3 Specification & Solution
The solution given is referred to as the Diagnostor.
It is a Java application composed of three classes:
Primer in Primer.java, Archive in Archive.java and
KBBoundary in KBBoundary.java18. As can be
seen in Fig. 13, they are rather simply organised
with respect to one another.19
Figure 13.
3.1
3.1.1
Figure 1Class Diagram Solution
Requirements
Notes on Long-term Requirements
As has been mentioned, there is potential for long term production on the Diagnostor, it is a
project which is of value to the Institute and, further to that, there is possibility of commercialisation.
With that in mind, there are concerns both towards the modularity and form of the product.
The GUI as established has been prepared with little mind to aesthetics, rather the functionality and concision of the display was paramount. As the GUI is handled in the Primer.java class,
it is noted that a sustained, long term implementation of the Diagnostor will highly likely see
an independent GUI class formed, especially if further use-cases which have been considered
are implemented.
Moreover, throughout the project is was therefore doubly important to hold to good software development practices in the event that, regardless of the outcome of the project, the
Diagnostor may be picked up by another developer for further exploration or utilisation by the
Institute.
18
Some methods which are not utilised in the implementation, but which were developed and still
persist within the KBBoundary.java file were heavily informed, as a starting point, by discussion at
DANIWEB[9]
19
It is noted, as off completion of development, that the classes Archive and Primer are somewhat
bulging. They are effectively ready to be decomposed into four separate branches, one each
governing: Primer becoming a dedicated user boundary and a GUI operator, whilst Archive would
separate out to become a data controller and an actual diagnostics operator which would hold the
computational ‘power’ offered by the Diagnostor.
- 22 -
3.1.2
Requirements of the Project
The Institute of Aquaculture seeks a desktop application which will allow the user to conduct
diagnoses (to specific bacterial strains) by providing the system with a set of bio-chemical test
results as input. On top of the six Primary test values, these biochemical test results individually take one of three forms: Positive, negative and unknown.
The application has four primary functions:
I.
The application will furnish the user with the ability to input these tests and obtain a
diagnosis.
II.
To return recommendations of which other tests would be conducted as a means to
further narrow a diagnosis (e.g. in the case of multiple matches). 20
III.
Further to the above: group and distinguish between equally discriminatory tests21.
IV.
The ability to maintain the KB and also to extend it by introducing new strains and
their attendant test results.22
Diagnosis will be conducted by algorithm which compares the input test results with the established Knowledge Base provided by the Institute of Aquaculture. Additional information is
arbitrarily generated for demonstration purposes in the event that such information is not
available.23
The nature of the Knowledge Base was intended to be maintained via a simple .csv file.
Embedded within will, primarily, be the tests (and test names) along with the respective results
for each identified strain.
The most pertinent aspect of the KB, the results, were organised as follows:
KB[i] = [Genus, Strain, TP1, ... , TP6, TS1, ... , TSn]
20
It is important to bear in mind that it is possible for a diagnosis to match multiple strains completely
without ability to further discriminate. Two distinct strains may yield the same results for all tests.
21
If one is to decide between four strains matching your original input, and the results known for these
strains in a choice of four as-yet unknown (not yet conducted) tests are: t1(+,+,+,+), t2(+,+,-,-), t3(,+,+,+) and t4(-,+,+,+). The selection mechanism would do well to recognise that t1 would yield no
discriminatory information, whilst t2 would most significantly determine between strains. However,
t3 and t4 are equally discriminatory, and only one of them is required to offer the attainable
information relevant to distinguishing between these strains.
22
Again, this requirement has not been met.
23
This specifically pertains to costs: it may be instructive to demonstrate potential savings/direction
based on availability of resources to users in a management capacity.
- 23 -
For each strain, i, in the knowledge base, it will properly embed the data of TP1, the result
for Primary Test 1, TP2, the result for Primary Test 2 etc. These are distinguished from the
strain tests 1 through n which, once genus is established, begin identification within the genus,
e.g. determining which strains. In this manner, a strain will ideally be identified first by genus
then narrowed to a strain.
Bear in mind the concerns given coherency and duplication within the knowledge base: the
knowledge base is concerned with biological facts, the enforcement of idealised rules concerning uniqueness of entries and full spanning quality of the KB is thus avoided., The hope is
merely to capture the processes of the experts; they can cope with the complexity and overlap
of strains associated to a culture, therefore so will the Diagnostor.
3.2
Assumptions
For the Diagnostor proper, the assertion of some clear assumptions for which the entire project hinges may be made:

The Knowledge Base aspires to completeness. All entries have full, unambiguous results
noted and all entries will obey this. 24

All 2D arrays will be rectangular so as to facilitate ready navigation and reading.

Cultures in the KB are not necessarily coherent. It is possible for two such distinct strains
to be represented by cultures possessing the same assortment of values.

It is possible for strains to be identified by more than one profile. The tests do not enforce
uniqueness.

Diagnosis is conducted one sample at a time.25

A match is valid if all known and specified tests given as input are correctly identified to
strains(s) in the KB. 26
24
This can be altered. It is possible to accommodate incomplete entries, to work with partial
information. Awaiting confirmation from Dr Crumlish on the generality of such assumptions
25
If reading of inputs were automated and file-reading were enacted, this need not always apply: many
test results for many samples could be fed into a more advanced version of the Diagnostor at once,
with the GUI affording relatively easy navigation of many results, but not for this version of the
Diagnostor!
26
As mentioned previously, this could conceivably be altered: a user may elect for, say, ‘only 80% of
inputs’ to qualify for the diagnosis, given some reason for the uncertainty. Moreover, output could
be a scale of ‘most accurate matches’ down to yet a lower threshold (even as far as simply ordering
the KB with ‘most accurate matches’ given first.) Similarly, it could be conceived that rather than
- 24 -
o
A match is not returned if the given inputs do not wholly match at least one of the
strain cultures held in the KB.
3.3
Data Cleaning & Manipulation
The original .xls data file received from the Institute is noted to have the form seen in Figure
8. Though readily translated to a .csv file, the nature of the planned analysis of elements of
aforementioned data structures meant that the contents of each and every cell had to be carefully checked to ensure it adhered to the expected format. Notably, a large amount of the
entries in the .xls file had been slightly padded as follows “ + “ instead of “+”; whilst this
white space is innocuous on visual inspection (albeit very faintly untidy when inconsistently
applied), it could wreak havoc if not checked when entered into the KB.27
Though was given early on to the possibility of some parser-mechanism which would
check the entries being read in via the KBBoundary to ensure formatting, though as this functionality itself was never properly enacted, the need for a parser at such point was never
probed deeper. Nevertheless, trimming whitespace from Strings is effectively a trivial matter
in Java, albeit one that could incur further problems if overzealously or carelessly applied.
Trimming white-space and separating read-in lines on selected tokens is one thing; detecting whether a String’s contents are at all sensible, in the case of hypothetical ‘other’ options
for Primary tests, is not necessarily a trivial matter at all. To some extents fortunately, this was
sidestepped by the decision to treat only for Boolean test results when reading in the results
section of the KB. This lessens the potency of the Diagnostor, certainly, but it does allow for
ready testing of functionality.
returning ‘matches’, the returned set could be those for which it is noted not to be. These are, by
this assumption, excluded from this version of the Diagnostor.
27
Though the principle concern is indicated to be reading it into the system, it did present a task, in the
end, when the author mistakenly accessed a pre-cleaning version of the data set when attempting
to hard-write it into the code. The author had made significant progress in re-cleaning it before the
realisation of the mistake had occurred! Hardly time consuming, but somewhat vexing...
- 25 -
Review of Algorithms
3.4
At this stage it serves well to review the plan of the algorithms alluded to until now.
3.4.1
The Diagnosis Algorithm
Client tests input I[t] to be checked against knowledge base KB[r,t]
Note M which will hold matched results
ForAll t {
IF isMatch(I,KB[r]) THEN
add KB[r] to M
} return M
DefN: isMatch(I, KB[r]) = {
True:
forall t. IF ( (I[t] == KB[r,t] ) OR (I[t] = ‘?’) )
False:
Otherwise
}
IF (M is empty)
@ user:
“No matches found for your test results.”
ELSE IF (M is only one entry)
@user:
“Unique match found. Details...”
ELSE IF (M is many)
@user:
“Multiple possible matches found.”
FOR each entry in M
@user:
“<detail of each entry>”
3.4.2
Recommendation Algorithm
As noted, the process for solving the recommendations task rapidly became complex and,
though the concept was pinned down in advance, proved to be particularly vexing in its implementation.
Discursive description:
1. Consider only tests which were unspecified in the original input
2. Compose new KB’ of M and the unspecified tests.
3. Note characteristic profiles for the tests:
a. Each test’s profile
b. Whether the test is homogeneous
4. Group tests by distinct profiles (e.g. partitioning the set of tests into disjoint subsets)
5. Eliminate subsets whose profiles are homogeneous as they do not discriminate further.
- 26 -
6. For each disjoint subset unified by a distinct, heterogeneous profile, select28 one test
and add that test to set R, the recommendations. (Set R will be a set which should, it is
hoped, offer best best-possible distinction of diagnosed strains.)
7. Return set R of recommendations.
3.4.3
Recommendations with Cost Algorithm
In 3.4.2 above point 6 deals with selection from an ‘equally discriminating subset’. Here it is
wished to conjecture a manner in which this is evaluated based upon asserted cost factors. For
instance: In KB[r,t] there will be an associated pieces: KB[t,c], where c = {d, l, f}, e.g. delay,
labour and finance. These costs are associated to the tests. These are generated arbitrarily for
the purpose of this discussion, but it is assumed that, in the long run, such knowledge could be
obtained properly if desired.
Discursively, interjecting at stage 6 in 3.4.2 and continuing:
1. Access a set, P, of equally discriminating, heterogeneous subsets.
2. Note the discriminator selected by the user D(d,f,l)
3. One-value subsets are trivial and need not be considered further; their sole value is
their recommendation.
4. For non-trivial subsets, they ought be reordered according to corresponding KB[t,D],
e.g. where KB[t1,f] = £10 or KB[t2, d] = 5 hours. Assuming ascending ordering (and
that discriminator D is to be minimised), continue.
5. From each subset in P, select the lowest (first) value to be the recommendation.
As can be seen, this is conceptually not a terribly difficult problem, but it is involved and enters into the Diagnostor the concept of ordering (or rather: reordering). Until now only linear
searches of mundane (lack of) innovation have been considered, the incorporation of costs
would be a wholly more complex step.
Nevertheless, though costs were embedded into the KB and function is left to properly organise them, the actual algorithm itself is not actually implemented and little progress was
made to formalising the above to a programmable quality.
28
In the Diagnostor’s actual setup, it is such that for multiple options, the last presented choice is kept
as the recommendation: this is mere convenience from a programming perspective. It is a crude
implementation: writing each to the set, such that the final one added is the last one written, the
other options have been ‘covered’ by successive additions. Hardly a mechanism of genius!
- 27 -
4 Implementation & Function
Solution Achieved: A Working Core
4.1
Strictly, the Diagnostor achieves three of the four aims laid down early on in development:
I.
Use inputs supplied by user to achieve diagnosis
√
II. Maintain and add to the knowledge base
X
III. Functional user interface
√
IV. Advise further tests to narrow diagnosis
√
The Diagnostor returns accurate diagnoses on F. Psychrophilum as well as correctly identifying tests which can/should be conducted. The GUI is functional and allows intelligible
navigation of the input/diagnosis process, though it is far from perfect.
Some progress was made towards incorporating the costs aspect, though these are largely
superficial. The coded algorithm for assembling the recommendations is functional and accurate, but it is unsatisfactory when it comes to projected extension; it can likely be untangled
and streamlined to more intuitive code. But, as noted, it is functional.
The lack of ability to read directly from a file and to maintain the knowledge base is a severe limitation. It is still thought that the mechanism for this should be somewhat trivial,
though as the problem persisted amidst development in other parts of the project, attempts to
make progress with it (or even divine the nature of the obstacle) were eventually sidestepped
as the solution was not forthcoming and, though it would be useful, was not a critical factor in
the diagnosis/recommendation mechanism itself.
As the belief is retained that solving it should be trivial, it is hoped that, for the benefit of
the Institute, further work on the problem after submission should yield a significant improvement to the product, ideally moving the Diagnostor from a ‘proof of concept’ to a
working, almost autonomous product.
- 28 -
4.2
Walk-Through: Use of the Mk0 Diagnostor
It is now time to see the Diagnostor in progress. Consider the situation:
You are an academic of the Institute of Aquaculture, having made only cursory examinations and having eager students to assist you, you have swiftly collated the values for the six
Primary tests, the control test and two biochemical tests, the ninth and tenth available tests,
noted as C4 and C8. Both of these returned negative values. You move to your machine and
begin awaken its spirit...
Figure 14.
Initial GUI View
As can be seen in Fig. 14, the buttons available are “+”, “-“ and “?” (also “Reset” and the
“Costs” drop-box.) “Back” and “Diagnose!” are as yet unavailable. Confident, the user might
select a button given that they know the first seven tests are always “-+++++-“ for F. Psychrobilum.
- 29 -
Figure 15.
Seven Inputs Entered
And there, for Fig15, can be seen the pleasant, unthreatening entry of the first set of results.
Bolstered by this success, the user may then feel canny enough to enter “?” for Test 8, then “-“
for both tests 9 and 10, C4 and C8! The remainder, 11+, would be “?” too. Doing so...
- 30 -
Figure 16.
All 26th in!
Fig. 16 plainly shows the complete diagnosis set entered, albeit with a few mysteries. And
so, once all are entered the choice remains: reset or continue to diagnose. Presuming confidence remains, the user opts to diagnose...
- 31 -
As of Fig. 17 then shows, this choice of inputs yields a diagnosis set M of six results, notably the Flavobacterium psychrophilum of strain ID 18, 28, 40, 41, 70 and 71. Associated to
this set is the set R of recommendations who appear by their test names: cys, N-a Beta glu,
acp, Alpha-glu and Beta-glu.
Figure 17.
Diagnosis and Recommendations
And yet, Fig. 18 shows, by direct inspection of a narrowed sample of (eliminating rows as
per the algorithms described) the KB:
Figure 18.
Figure 2 Raw KB
- 32 -
That is: the Diagnostor works. It has correctly, for this case, pinpointed the recommendations
exactly as expected. It should be noted, however, that the ‘distinct discriminating subsets’
aforementioned would split as follows: 3:{val, cys}, 4:{try, chr, N-aBeta glu}, 5:{acp}, 6:{np,
alpha-glu} and 7:{Beta-glu}. Selecting the last of each gives the results seen in Fig. 14.
And, for completeness, Fig. 19 corresponds to input which yields no matches (contradicting even one Primary tests is sufficient for this result.) Whilst Fig. 20 shows a singular match
obtained.
Figure 19.
Figure 20.
No Matches
Singular Match
- 33 -
4.3
Towards Developer Testing
Testing the Diagnostor is an important aspect. Throughout the development effort has been
made to ensure that it is consistent and functional; heavy use of the Java console output was
therefore used. As can be seen in Fig. 18, there is detail and manipulation going on ‘under the
bonnet’ (ideally sans bee) which is both somewhat complicated and largely irrelevant to the
academic using the Diagnostor provided it works. If it were to not work, by giving wrong or
erroneous answers then any potential user who was not the developer would almost be as
quick conducting the whole operation themselves as checking the details as it progresses.
With that in mind, it is hoped that testing throughout the development of the Diagnostor as
it stands has been sufficient to ensure if it is both properly functional and does not offer any
bizarre outputs to the user. Of particular concern, given the sheer number of arrays and dependencies involved in the Diagnostor’s interior working, one would wish to see the limits.
Care has been taken to ensure that array limits are linked to variables, such that if, say, the
size of input were to be changed, everything else would correctly flow: only a few minor details would need to be changed at this stage. Otherwise, much of the variability is effectively
fixed and, if it works at all, it should be consistent. Due to the nature of the Diagnostor’s existence as a ‘core’ piece to what is expected to be a larger project, it is with some satisfaction that
it is viewed working stand-alone.
4.3.1
Extremes & Boundary Testing
Consider the check inputs:

All “-“s:
Diagnostor correctly yields no matches found, as per Fig. 19.

All “+”s:
Diagnostor correctly yields no matches found, as per Fig. 19.

All “?”s:
The trivial case, the Diagnostor yields the entire KB as a diagnosis,
with all 16 heterogeneous tests as recommendations, see Fig. 21.
Input of any test set which is non-discriminating (e.g. only the Primary tests and the control, as
“-+++++-“ will yield the result analogous to the trivial case above.
Attempting to return all the way back from the cusp of inputting test 26 (fuc) will
work correctly, though the inputs already in place remain displayed. This is not an error, as
such, but it is something that users may find perplexing: when moving back entered inputs
will not be erased unless specifically overwritten by contradiction. Only selecting ‘reset’ will
properly remove all inputs.
- 34 -
4.3.2
Notable Errors & Malfunctions
The only notable error at present, which is ultimately more of an oversight, is that as the “Diagnose!” button is activated on completing all twenty-six inputs, the “Back” button is also
disabled. This is a minor concern, but it is likely something which may skew users and certainly, in hindsight, seems a baffling decision to have made. As noted, it is surely an interaction
oversight which remains to be corrected in future versions.
Figure 21.
The Trivial Case
- 35 -
5 Evaluation
Wherein the Mk0 Diagnostor is reviewed and assessed by the developer.
5.1
Critical Review of the Mk0 Diagnostor
The Diagnostor works very effectively: for a properly formatted KB it does indeed give the
correct diagnosis and happily yields a correct set of ‘best’ recommendations minimally29 required to properly distinguish between possible diagnosed strains.
It is regrettable that the Mk0 Diagnostor is not capable of reading directly from a file. Of all
the success displayed, it is effectively a very minimally useful tool. Time saving, certainly,
especially when quickly adapted, but it applies presently to but one genus, that of F. Psychrophilum, hardly an astounding contribution to the progress of microbiological science.
However, it does clearly demonstrate the immense time-saving ability of a wider, more
comprehensive system. Indeed, the mechanisms developed to conduct diagnosis and compute
for the recommendations is a significant achievement, if not a particularly resounding one.
5.2
What was not achieved
As previously noted, the critical failure of the project so far is the lack of functionality in reading directly from ‘outside’ knowledge bases. Perhaps effort to establish this would best have
been applied prior to even accessing the knowledge base supplied by the institute of aquaculture.
Nevertheless, room for functionality is allowed. The KBBoundary.java file has several
methods and a constructor ready to be extended and tweaked to allow this function. Conceptually, the Mk0 Diagnostor, though complete, is not the ‘ultimate’ finished product: more work
has to be conducted on it.
The Graphic User Interface, though functional and well established, is poorly implemented
in the code, a stand-alone interface linking to an internal Diagnostor which in turn accesses a
knowledge base has been a long-standing vision held in mind in this project. Work on this
front is hoped to be continued after submission, allowing a further separation and partition
function now that the core concept of diagnosis is demonstrated to be effective and achieved.
29
It is widely noted throughout that the terminology used is very relaxed. Formulation of the problem
in rigorous, formal terms was beyond the scope of this project, so the idea of a ‘minimal set’ should
be understood as a mathematical or set theoretical assertion!
- 36 -
5.3
Deployment Solution
A primary feature of user response to testing is that the Mk0 Diagnostor is not a stand-alone
application. Though this is readily accepted by the developers, it is unfortunately not so useful
for the clients or those who originally sought IT support in the institute.
Although it is effective, this presents ready consideration for future development. The selfcontained nature of the project means it is somewhat easier to deploy and demonstrate, with
setup amount merely to configuring a suitable environment (e.g. installing Eclipse, BlueJ, see
Appendix A). This is a sorry situation, but hardly insurmountable.
Though by no means outside the scope of this project, it was ultimately outside the project’s reach: the work put in to even beginning the Diagnostor project, conceiving it and
understanding the Institute’s requirements (as well as the time spent in acquiring the sample
knowledge base originally) meant that only a certain amount could feasibly be implemented.
Settlement for a skeletal core, or perhaps more accurately a stand-alone ‘brain in a jar’ that
is the Mk0 is a suitable compromise. It sets a vital organ into place and demonstrates an effective core concept with a clear ability. In many respects this depth first approach cuts straight to
the heart of the initial problem, though it is hamstrung by its lack of immediate reapplication
to even slightly different problems. (See On Maintenance in Appendix A for further discussion of reapplication of the Mk0 Diagnostor directly.)
5.4
Reflection on User Feedback
Appendices E & F reflect user feedback as exposed to Appendices B, C and E, along with use
of the actual Mk0 Diagnostor. Unfortunately several other questionnaires which had been assembled have since been lost, but their knowledge remains.
It is almost unanimously said that the Interface is seriously flawed. User feedback clearly
exhibited the short sightedness and lack of perception involved in the creation of the User
Guide and Installation Guide (Appendices B and A respectively), even though most users never saw the latter!
As has been apparent, it is developer expertise which has been key in allowing the test users any hope of ensuring the Diagnostor’s proper functioning. The most damning criticism is,
perhaps, how far from obvious what it is supposed to do is. Though this is not directly from
our contact within the Institute who has unfortunately not yet been available for trialling it, it
marks a serious consideration for tweaking and improving the Mk0 even before initial deployment after submission.
- 37 -
Having said the above, it is also the case that users were indeed mildly impressed by the
functionality of the Mk 0 Diagnostor: that it quickly and readily achieves a diagnosis. Though
hardly a remarkable achievement given technology as it stands, as a very specific replication
of the function previously conducted at length by academics, it does appear to meet that demand well.
Most pressingly, the user evaluation highlights the unfinished nature of the Mk0. Lacking a
trouble-free GUI (many users note the remaining clumsiness/tediousness of the method of input), multiple functions (such as adding to the knowledge base or manipulating it) and even a
thoroughly organised, intuitive structure of code, the Diagnostor plainly has serious flaws.
Nevertheless, these are acknowledged and to some extents anticipated: the design of the GUI
was never intended to be a significant portion, merely one facilitating function and demonstrating room for expansion.
To this extent, the lesson of the feedback is clear: that mechanically sound as the Diagnostor appears, it is still some distance from a well formed product; it is a working core, but
requires more and better formed layers to allow successful adoption.
- 38 -
6 Conclusion
6.1
Approach
Strictly, there were some fundamental errors and oversights in this approach, as well as a particular, resounding success.
Clear progress has been made in aiding the information technological plight of the Institute
of Aquaculture. Comprehending the situation facing their diagnostic consultancy was key to
even beginning work on this problem and, as can be seen in Sections 1 & 2, this has been well
explored, with insight gained into potential developments in many other directions beyond
simply aiding in automating/augmenting their diagnostic process.
This aspect was a significant point of the first half of the time spent on the project. Alongside it was spent many hours considering the longer term design of the project. Though the
GUI is functional, it is not expected to remain as part of the Primer.java class, indeed it is a
critical oversight early in development that meant it remained within that! (Time was not
available to undo that error, alas.)
In many respects, the project transpired to be somewhat larger than the scope of one summer project, certainly given its approach. Starting from the ground-up, designing and holding
the diagnostic software internally in a Java project consumed much time and brainpower, but
nevertheless was deemed necessary (and vindicated) by clearly implementing of the perhaps
slightly less clear, more ephemeral vision of the Diagnsotor held initially.
6.2
Deployment
As noted, the Diagnostor as a stand-alone piece is both useful, with regards to demonstration and supporting the concept that yes indeed, the diagnostic technique of the Institute can be
readily augmented with a little IT insight. That said, its lack of interface with a dynamic
knowledge base is a critical failure, one which it is hoped will be rectified so as to fully allow
the Institute well realised, readily used augmentation in its Diagnostic Consultancy.
6.3
Future Work
Several aspects have been noted throughout this project as avenues for future work, these can
be distinguished in two primary ways. First those that pertain to the Diagnostor, secondly
those that pertain to the Institute’s IT plight more broadly.
- 39 -
6.3.1
6.3.1.1
Expanding the Diagnostor
Incorporation of Costs into the Recommendation Algorithm.
The commented-out code:
Compose subCosts[][]; prepSub2D(subCosts, costsBase2, numbTestsRemaining, noCosts,trivialCase, setM, setTs);
and the method
organiseProfiles (int testLim, int[][] subTestSource, int[] mapTarget,
int[] subTarget, int[] testsSet) {...
form an excellent and ready starting point for any such further implementation, given the
algorithm conjectured in Section 3.4.3. Though not requested in the requirements elaborated
by the Institute, pairing such information accurately could be made to be an effective organisational tool.
6.3.1.2
Reading/Writing to a free-standing Knowledge Base.
Critically the Mk0 Diagnostor only functions with respect to F. Psychrophilum. This is an embarrassingly small capacity.
6.3.1.3
Proper representation of all test options (beyond positive/negative).
Limited somewhat in development by only having access to the F. Psychrophilum KB, future
development will seek to capture the nuance and possibility allowed by the Primary tests,
making determination of genus something that the Diagnostor itself may achieved, or potentially allowing both the Diagnostor to do so on its own, or to allow the user to override this
and make the selection themselves. There are many possibilities in how this might be implemented.
6.3.1.4
Non-linear entry of test inputs.
Consideration was given early on to non-linear entry, but with the indication that the Institute
conducts the biochemical tests as part of a suite of standard tests, this linear approach was kept
for convenience, e.g. the option to simply select the ‘name’ of the test (for example C8, Betaglu or Fermentation) and then be presented with options relevant only to that test. With large
tests inputs it may be unwieldy, but testing and inspection of a wider knowledge base would
readily inform any decisions made here even from the very outset.
- 40 -
6.3.1.5
Controlling Level of Detail
Currently the Mk0 only offers the strain name and ID as diagnosis and the test name as recommendations: though suited for a well informed academic, it is not all that it could be.
Allowing for the option of adjusting the level of detail given in a response (what the
expected results of a test would be), perhaps giving the results in a more (or less) conversational manner, allowing the tabling of information and so forth are all aspects that a client may
well request, expect, demand or simply be pleased to see appear: nuance that an IT developer
may understand and perceive, but that a client might overlook or ignore as it is not precisely
what they need at any given time.
In essence, with the information available, the Diagnostor could be made to allow users
more intuitive exploration of the knowledge base and the mechanisms available, not just the
mechanical “input-mystery-output” process with which people may settle.
6.3.1.6
Development of the Diagnostor by other means
The mechanism and principle inherent in the Mk 0 Diagnostor is not restricted to Java by any
means. Much of the work is essentially language independent. Expansion and formalisation of
the algorithms, conducting analysis based on efficiency and potential for automation is certainly a valid starting point. With that in mind a great many approaches could be undertaken.
A web-hosted PHP script which interfaces with a KB held at the Institute as a relational database might well be efficacious, though it has certain security ramifications that are not as
prominent as with a desktop Java application. Alternatively the production of a mobile phone
App which is provided for use by Institute’s academics might well be a novel solution too.
There are many possibilities and the enterprising student or developer may indeed wish to
consult the Institute with such ideas.
6.3.2
Applying IT solutions to the Institute’s knowledge resources
As has been discussed previously, the Institute is host to a large collection of knowledge which
is not widely available; so much so that they reject other diagnostic software available because
it is unable to incorporate their own knowledge sufficiently in a way that is better (albeit with
more time consumed) handled by their own academics inspecting by eye!
The Diagnostor is only one possible solution. Though conceived as potentially via Python,
though ultimately implemented in Java, there are many other deployment mechanisms. Above
are noted simple conjectures for translating the Diagnostor to other formats, such as a phone
App or a web-based service, there are more fundamentally different possibilities too.
- 41 -
The application of case-based learning or indeed data mining to the Institute’s knowledge
may be feasible, should they be prepared to participate in an endeavour such as that. Automated reasoning may well offer more direct progress in Aquacultural studies, albeit by
interdisciplinary application of more commonly Computer Science solutions.
6.4
Concluding Remarks
In brief, it is pleasing to note that the Mk0 Diagnostor does indeed work. It is a small step in
development, using basic techniques and only minimally complicated data structures to
achieve a task. It is hoped that the form of the code is readily intelligible for any who might
use it to expand the Diagnostor beyond its currently somewhat limited capacity.
Nevertheless, it stands as a stunted testament indicating many plausible and attractive options for the future. It is a step in the right direction and a way-marker for any who might
progress in the same direction. With only some more development focus and input, it could be
expanded readily to something very easily used by the academics of the Institute for Aquaculture, perhaps more broadly useful not only as a diagnostic aide, but if the cost aspects were
factored in, as a management tool able to fully begin increasing efficiency beyond the simple
matter of the diagnostic process itself.
Knowing it was hoped that the final stage would be more advanced, it is not with too heavy
a heart that one may view the Diagnostor. Detailed inspection of the code may reveal some
unwieldy statements and easily improved structures; it is hoped that such improvements are
not too blatant or too embarrassing for the developer. The effort and insight fed into the project
feel well rewarded in seeing that it is indeed effective.
Though the recommendations offered by the Diagnostor are the result of brute force, it is
reassuring to know that they are nonetheless well formed. As the complexity of the recommendation process became clear, it bolstered resolve in development: the task was set and the
challenge accepted. It is not a general solution and there is a capacity for oversight provided
by it (the Diagnostor makes no mention of the population of each ‘non-homogeneous distinct
subset’ merely that a selection has been made and it presents the user with that selection: it
may have been the only choice!), but it is felt that this is an acceptable trade-off for the speed
with which the Diagnostor achieves its result over the more elaborate and time consuming approaches conducted by experts in optimisation, or even by the lay working to their own
unguided tune.
With three of the four aims achieved, it is difficult not to view the Diagnostor as only 75%
achieved. Still, it is encouragement for future development and a foundation or inspiration for
further study, a cliché perhaps, but a justly founded one.
- 42 -
References
Note: As little work is widely available on clinical diagnostic tools, largely due to their commercial nature, much work on the Mk0 Diagnostor has been conducted ‘from scratch’, though
it is hardly innovative enough to be considered remarkably original.
[1] University of Stirling, Institute of Aquaculture, Integrated Health Management ‘Consultancy Home’, http://www.aqua.stir.ac.uk/diagnostic/, September 2011.
[2] International Code of Zoological Nomenclature, supported online by the John Spedan
Lewis Trust, http://www.nhm.ac.uk/hosted-sites/iczn/code/, September 2011
[3] B.A. Austin & D.A. Austin, Bacterial Fish Pathogens: Disease in Farmed and Wild Fish,
Second Edition, Ellis Horwood, 1993
[4] bioMerieux’s apiweb application, http://www.biomerieuxdiagnostics.com/servlet/srt/bio/clinicaldiagnostics/dynPage?doc=CNL_PRD_CPL_G_PRD_CLN_12, September 2011
[5] John R. Hubbard. Schaum’s Outline of Data Structures with Java, Second Edition. The
McGraw Hill Companies Inc., 2007.
[6] Masterchef Synesthesia, http://www.wordmagazine.co.uk/content/masterchefsynesthesia-or-i-like-the-buttery-biscuit-bass, September 2011
[7] B.A. Austin & D.A. Austin, Bacterial Fish Pathogens: Disease in Farmed and Wild Fish,
Fourth Edition, Praxis Publishing Ltd., UK, 2007
[8] V. Inglis, R.J. Roberts, Niall R. Bromage, Bacterial Diseases of Fish, First Edition,
Blackwell Scientific Publications, 1993.
[9] DANIWEB, IT Discussion Community,
www.daniweb.com/software_development/java/threads/17262
- 43 -
Bibliography
Throughout the project, many works were consulted in general, offering a guiding inspiration
and insight to the course of the project.
Generally Consulted Texts

David Avison & Guy Fitzgerald, Information Systems Development: Methodologies,
Techniques and Tools, Third Edition, McGraw-Hill Publishing, 2003

A Chetwynd and P. Diggle, Discrete Mathematics, Butterworth-Heinemann, 2003

Robert J. McEliece, Robert B. Ash, Carol Ash, Introduction to Discrete Mathematics,
International Edition, McGraw-Hill Book Company, 1989

Steven S. Skiena, The Algorithm Design Manual, Springer-Verlag, New York Inc.,
1998
Widely Consulted Web Resources & Sites

Java Samples, http://java-samples.com/, last accessed September 2011

Stack Overflow, http://stackoverflow.com/, last accessed September 2011

Oracle’s Java Technical Documentation, http://download.oracle.com/javase/, last accessed September 2011
Other Resources

Course notes and material from University of Stirling ITNPs 11, 21, 62 & 92 and University of St Andrews CS1002, MT2002, MT3501 & PH4030.
- 44 -
Appendix A: Installation Guide
1. Download & install Java: http://www.java.com/en/download/
2. Download
&
install
a
code
compiler:
http://www.bluej.org/download/download.html
(Only if a Java development environment has not
been prepared. If Eclipse or Dreamweaver available, those make more sense.)
3. Load project into editing environment
4. Run the project (BlueJ specific: right click on the
box entitled Primer, click on “public static void
main” )
5. Witness the GUI appear onscreen.
Note: The KB is hardwired into the code at this time.
On Maintenance
It is worth noting that the Mk0 Diagnostor has no active facility for expanding its knowledge
base or receiving updates to it. Presently this is only achievable by directly altering the representation held prominently as “private String kbPRoxy[][] =...” in the class’ file
KBBoundary.java . Artefact code is leftover demonstrating attempts to allow reading directly
from a file, but it is not functional.
This is terribly unwieldy, but it is possible. If this is done then the corresponding variables
noted beneath it, e.g.:
final private int name row = 0, nameBuffer =2,...
must be ensured to be correct alongside a modified KB. Finally, the initialisation variables in
class file Archive.java must also be checked for correct correspondence, notably:
final int noTests = 26;
Though cumbersome, an expected extension to the Diagnostor will to allow loading directly
from .csv file.
- 45 -
Appendix B: User Introduction
Welcome to the Mk0 Diagnostor.
The purpose of this information system is to present an interface and mechanism for which
an academic concerned with, say, disease causing bacteria in farmed fish populations may diagnose a test sample by comparison with an established knowledge base.
By receiving, sequentially, the real-world results for so-called Primary and biochemical
tests, the Diagnostor then feeds this information into its interior. Within, the information supplied by the user is analysed and compared against a preferred information set, known as the
knowledge base.
The Diagnostor will return the information gained from this comparison and analysis to the
two output fields visible in the Diagnostor. The upper of these two will show the results of the
actual diagnosis, i.e. the strains (if any) which fit the results given by a user.
The second of these will display the recommended tests which are needed to best distinguish precisely which strain is being considered if there is more than one.
Under the Bonnet
Throughout use and trial of the Mk0, you will find it offers some insight into the nature of the
F. Psychrophilum knowledge base with which it is concerned. If you feel confident in accessing the Java editing environment, you will notice that the console output also has some
underlying facility which shows the activity of the internal workings, particularly the binaryrepresentation and classification process of the recommendation-determination process.
There are “System.out.println” lines which have been commented out but which still
persist within the code to aid this exploration. In Archive.java, the method private void organiseProfiles(int, int[][], int[], int[], int[]) has a particularly insightful such point.
- 46 -
Appendix C: User Guide
Welcome to the Mk0 Diagnostor. I hope you enjoy your science!
Operating the Diagnostor is a simple process:
1. Activate the Diagnostor, you
may wish to look for the following in Eclipse:
2. Know the details which you
wish to enter.30 There are
twenty-six separate tests, though many will likely be entered as “?”.
For demonstration, try Test 9 and10; C4 and C8 as “-“. “?” for the rest.
3. Click the option to be entered for test one (in the F.psych. case always “Gram” which has
value: “-”), by clicking on the relevant option.
You should find the Diagnostor updates to the next test, with the diagnostic set line
reading:
4. Progress through the tests, entering unknown/unspecified ones as '?'. This is a sequential
process, but you can move back in the sequence by utilising the “Back” button. You may
restart the entire process by selecting “Reset” in the lower corner.
5. After entering the relevant tests, you will arrive at the following setup:
You should feel comfortable with pressing “Diagnose!”
6. Reap the rewards from the output!
30
For Flavobacterium psychrophilum, the first six tests are always entered as “-,+,+,+,+,+,+” and the
seventh test, the control test, is known to be always “-”. These can be entered also as “?” which
gives the same results; but entering anything aside from this (e.g. test 1: +) will yield no results for
this KB.
- 47 -
Appendix D: Legal Note – Institute of Aquaculture & Data
Protection
The data handled in the M0 Diagnostor has been released confidentially by the Institute of Aquaculture, part of the School of Natural Sciences at the University of Stirling. This information
is legally protected and should only be handled by those authorised by the Institute of Aquaculture.
The information is assembled so as to ensure the anonymity of the clients of the Institute;
this had been conducted prior to its provision for this project. Care has been taken to ensure
that this confidentiality is upheld.
- 48 -
Appendix E: Questionnaire
Greetings!
I hope you have found use of the Diagnostor a pain-free experience. I would be greatly indebted if you would complete this for feedback and reference purposes. The scale is ‘1-5’ as in
‘Poor to Great’.
Q. Zero
How useful did you find the User Guide?
Q. One
How well do you comprehend the purpose of the Diagnostor?
Q. Two
How did you find the output from the Diagnostor?
Q. Three
The Diagnostor was <?> to use?
Q. Four
Did you find the Diagnostor met your expectations?
Q. Five
How professional do you feel the Diagnostor is?
Q. Six
How useful do you feel it would be (if you had to diagnose bacteria
regularly)?
And, if you would be kind/critical enough, I would enjoy any (cruel) feedback on the aesthetic and usability of the Diagnostor. If you have comments on the design of the interface, I
would be keen to hear your thoughts...
Finally, if you have suggestions for any functional aspects of the Diagnostor, or indeed, any
criticisms in general, I would be grateful to receive them...
Your Name:
Contact Detail:
- 49 -
Appendix F: Questionnaire Responses
- 50 -
- 51 -
Download