Common Tool Technical Group Meeting August 12, 2010 Attending

advertisement

Common Tool Technical Group Meeting

August 12, 2010

Attending:

PDBe: Tom, Jawahar

RCBB: Jasmine, Zukang, Raul, Vladimir, Dimitris, Terry, Martha, and John

BMRB: Steve

PDBj: Takanori

Note change: The next Technical Team meeting will be Friday, August 20, 2010 at

9:00AM EDT, 2:00PM UK, 10:00PM Japan and 8:00AM Wisconsin. Agenda and list of required participants to follow.

Meeting Summary and ACTION ITEMS :

WFE/WFM overview:

Level 1 summarizes the status of all depositions in play for a specific annotator.

Level 2 enables the full workflow overview and management for a single deposition, as well as providing the inter-module GO BACK functionality.

Level 3 provides the log file drill down for a specific workflow module (ie sequence alignment or ligand processing)

1.

WFM Level 2 design allows for the depiction and navigation of multiple workflow classes (ligand, sequence, …) for a single deposition. Level 2 will include the “Go Back” functionality. (See: Tom and Luana’s screen shots of scenarios for a working sequence example, included with the agenda). a.

The workflow graphical displays will be done using the JavaScript

Raphael library. Tom will provide a mockup of this graphical display by next week. The back end of the graphical displays will take a couple of weeks and will not be part of the August delivery. This work will be assigned to the new PDBe systems engineer. b.

How will GO BACK be depicted (Jasmine)?

ACTION ITEM: Tom to mock up the interface functionality to include/demonstrate GO BACK. c.

Tasks are most typically done serially (80:20 rule) with multiple steps conducted in the background before annotator intervention.

ACTION ITEM: Tom will mock up a depiction of the business logic and assumed order of tasks for the 80% scenarios. d.

In the special case events that the annotators need to change the automatic serial order – how will this be managed at level 2? Interactions between workflows need to be addressed. An example of this would be a new modified amino acid or nucleotide which would have to be handled in ligand processing before sequence processing could be completed.

ACTION ITEM: Tom will mock up a draft of this interface functionality. e.

Can tasks be started from the Level 2 display? Yes, the WFM can run both in batch mode and in incremental mode.

f.

How will the workflow detect if it should be run (Does entry contains a ligand)? The business logic will define the automated sequence and the exceptions that require annotator interventions. One model is that each major set of tasks will run as far as possible automatically and then indicate through the WFM that annotator input is needed. g.

How will major tasks be scheduled to minimize annotator waiting time?

Long tasks like reference sequence database searches should be done automatically in the background. Many of these tasks will in fact be run during the deposition process prior to submission. Level one will only provide status information for each deposition – wait, running, or finished. h.

Graphical depictions will also be provided with the Level 3 display drilling down into specific functional components (Sequence or Ligand processing). i.

A global report summarizing all issues for an entry is needed to provide the annotators with the context necessary for processing an entry. How will this be addressed in the WFM?

ACTION ITEM: Tom to provide further mockups for group review.

2.

Log file management. Tom has captured logging output in named files

(deposition id, workflow class, workflow instance) that can be viewed by the

WFM.

ACTION ITEM: This should be available for view/testing by August 17.

3.

Sequence module testing. Jasmine reported results of sequence module testing by the RCSB annotation group. This level of activity initiated by the RCSB staff

“load test” was more than the “test” instance system could contain (17 concurrent processes). The memory load distribution and management implication of this unintended load test need to be better understood.

This result raises the following issues/questions: a.

First off the users had no way of knowing that the process that they were waiting for completion of had been “killed”. How will the interface alert users of such occurrences in the future?

ACTION ITEM: Tom to suggest solution. b.

If the workflow server died because the Linux operating system detected a VM over load - what are the full implications to our design? Questions about exactly how the Linux system deals with VM overload have been provided by Steve Mading – in his email description, on Aug 12, to the full team, - of what appears to be a dysfunctional family reality show. More discussion to follow. c.

What exactly happened during this load test? Since log files are not available we will need to rerun this exercise once the capture of log files is in place - ETA: August 17.

ACTION ITEM: Jasmine to coordinate with Tom and John to rerun the

“load test” scenario.

d.

What are the minimum memory requirements for the component tasks over a range of depositions sizes?

ACTION ITEM: John will provide some estimates of memory requirements for the executables used in the sequence module.

e.

How many active processes should we expect under normal circumstances now and in the anticipated near future growth of depositions and processing capacity? (as a reference point – RCSB currently employs 6 production servers with load balancing functionality and monitoring. ) f.

What queue management strategy should we consider for this application?

The resource requirements (at least some estimate of this) for each task needs to be known to the workflow engine in order to make this regulation possible. See 3.a and 3.b above. g.

What are the optimal hardware capacity requirements for the anticipated production load of the new D&A system? Based on the data we gather for the completed component modules we will be able to better estimate this. h.

What are the unanticipated implications of designing a distributed load management system.

ACTION ITEM : A plan will be developed for a load and parallel processing test in September. Tom & John to present plan by Sept 2.

Termination status of the sequence processing module - There is a distinction between saving results, pause may re-enter, and sequence processing completed. Jasmine to define the terminology for the buttons defining the possible termination states. Raul to investigate capturing the window close event to update the status.

Some “Run Engine” actions seem to take a very long time to complete. Some explanation of the problem may be found in the level four logs.

4.

Ligand module progress. Zukang will setup the ligand search code on the shared server by the end of day Tuesday, August 17.

ACTION ITEM : Jawahar and others will begin to do testing on Wednesday,

August 18, so as to be able to report preliminary results at the next meeting.

Jawahar will contact Zukang directly if there are questions about how to test.

5.

Next Week Schedule - Dimitris will visit Rutgers on Thursday and Friday next week to work with the annotation group on inhibitor ligand processing. Kim will also be visiting Rutgers on Tuesday next week. Kim will be available if there are questions about inhibitor ligand processing or DOHLC.

Download