haystack-script

advertisement
Haystack is a personal information client which can be tailored to an individual’s
requirements.
Within the context of bioinformatics and myGrid we have investigating its use in the
presentation of experimental results.
1) files 0 to 1m30s ( There are some long pauses that I haven’t yet edited out. Perhaps
this is the time to say using java and rdf can lead to slow applications)
The first step is to make haystack aware of the result files produced by the workflow.
Haystack is provided with interfaces to the basic file system, cvs, webdav, lsid servers
and can be extended.
Here we show browsing a portion of the local file system focus on the williams syndrome
results. Clicking on a file “scanreport” reveals a view of standard file properties and
commonly used metadata such as Dublin Core. It can be configured to detect specific
mime types and act on that information. Currently parsers exist for extracting additional
domain dependent metadata for genbank and PDB records.
2) collections 1m30s to 2m55s
A powerful feature is the ability to organise resources such as files in multiple ways. Here
we create a new list based on all resources involved in the williams syndrome work not
only files, but descriptions of researchers , groups and projects. We are able to find
resources not just by their inter relationships but also more conventional text search
techniques (important early in the process of managing information.) We enter text
protocol notes into a search box to get some hand written notes about the experiment,
then search for Williams resources a collection we made earlier of relevant resources. We
end up with a simple list of these two resources. Next we examine the contents of nested
collection.
3) ontologies 2m55s to 3m35
Many of the resources have been explicitly typed with ontological concepts. Resources
labeled with a C are examples of concepts used to type williams resources. Some are
general and reflect the mygrid information model such as Group, LabBook, and Person.
Others are specific to this work and type data such as BLAST comparison report.
Information within the ontology describes how each type of resource can be related to
other resources. i.e. a schema. So for example a person can have an email etc.
3) standard views 3m35 to 4m00
This ontology schema information is used to construct views of resources which can be
further edited.
Here we see the view used to present information about a person in this case the
researcher Hannah.
Each resource can be represented by multiple views, and information in those views can
be built up from complex queries spanning many relationships.
4) operations 4m to 4m25
Operations can be linked to specific types of resources. So for example her we see
browsing on a researcher gives us the operation of emailing that researcher which in turn
provides us with a partially filled out email form view.
5) LSID’s 4m30 to 5m
Much of Haystack has been developed for general tasks such as email and meeting.
However the developers are now focusing strongly focusing on bioinformatics.
Browsing on a resource identified by an LSID results in a series of calls to an LSID
authority to bring back the relevant data. Here a Genbank LSID authority is asked for a
genbank record. The record is then parsed and relevant metadata displayed in a genbank
specific view. Bespoke UI components can show graphical displays of sequence. Note
this is a circular map but the DNA is linear!
Operations have also been tailored to bioinformatics types. In this case viewing in qmol
is suggested (wrongly!)
We would like to extend this to
5) relationships 5m10 to 6m
A central feature of myGrid metadata is the dense network of relationships between
resources. For example the origin of a data file may involve many relationships with
intermediate results and services.
Haystack provides a mechanism to navigate these relationships as a graph. Here we see a
web of resources relevant to one run of the experiment. We can follow the derivation path
of result data from initial sources right through to final results.
6m00s to 6m43s
Not all information is best view in this format. At any time a resource can be shown in
property view by clicking on hyperlinks and a more conventional web page navigation
undertaken. In this case we have taken a simplified blast result file which then lists a
number of genbank sequences identified using LSID’s. Clicking on one of these produces
the genbank record view we saw earlier (but this time on a different sequence)
Not yet in video.
6) active tasks
Unlike many UI environments in which each task must be completed in one session,
haystack allows tasks to be parked awaiting further information. Here we show a active
task of sorting through a collection of sequences to review.
Download