A Learning Classifier System for Distributed Max-Flow Algorithm Fault Detection David Andrew Cape November 30, 2005 Abstract A distributed version of the Goldberg-Tarjan Max-Flow algorithm can be used in power network simulation. Software faults can be intentionally injected into the algorithm, and assertions have been developed to detect them. However, one wonders whether it is possible to detect and classify the injected faults by examining the messagepassing history, without accessing the internal data of the program. This leads to the general question of how much can be learned from the communication between processors alone. This research describes a learning classifier system (LCS), combining aspects of XCS and S-classifiers, which attempts to diagnose the condition of the program by examining the message-passing history from each processor’s perspective. One wants to avoid processing the partial-order data structure of events, and the history of messages is piggy-backed with each message, so each processor may have a slightly different history, but one hopes that by processing the histories linearly by using the history file of each processor, one processor at a time, that faults can be classified with a reasonable degree of accuracy. An LCS is appropriate for this effort because it would be difficult to define the conditions which correspond to faults deterministically. Statement of results will appear in the final draft. Introduction A distributed version of the Goldberg-Tarjan Max-Flow algorithm can be used in power network simulation. Power network simulation is important because it can help in the design and maintenance of the electric power grid. Some research is being done at the University of Missouri – Rolla and elsewhere to try to develop techniques for protecting the power grid from terrorist attacks and natural occurrences which could cause line outages and lead to cascading failures (blackouts). The Max-Flow algorithm can be used in these efforts. Software faults can be intentionally injected into the algorithm, and assertions have been developed to detect them. However, one wonders whether it is possible to detect and classify the injected faults by examining the message-passing history, without accessing the internal data of the program. This leads to the general question of how much can be learned from the communication between processors alone. Please see [AGMC] for details about the distributed Max-Flow algorithm and assertion-checking. The relevant message types are categorized by the following description: PFm: attempting to push flow to a neighboring vertex AFm: accepting the requested flow RFm: rejecting the requested flow Distm: updating a node’s distance Fm: a fault message indicating the detection of a fault by assertion violation The injected faults are of the following form: Edge fault: a given edge’s flow is arbitrarily increased by 10% Vertex fault: a given vertex’s calculated excess flow is doubled Lose all flow messages: cease transmitting PFms Randomly lose flow messages: lose PFms with probability 0.1% Alter all flow messages: modify each PFm by 1 unit of flow Randomly alter all flow messages: modify each PFm by 1 unit of flow with prob. 0.1% Invert all accept/reject messages: change AFms to RFms and vice versa Randomly invert accept/reject messages: change as above, with probability 0.1% This research describes a learning classifier system (LCS), combining aspects of XCS and S-classifiers [ES], which attempts to diagnose the condition of the program by examining the message-passing history from each processor’s perspective. One wants to avoid processing the partial-order data structure of events, and the history of messages is piggy-backed with each message, so each processor may have a slightly different history, but one hopes that by processing the histories linearly by using the history file of each processor, one processor at a time, that faults can be classified with a reasonable degree of accuracy. An LCS is appropriate for this effort because it would be difficult to define the conditions which correspond to faults deterministically. Statement of results will appear in the final draft. Design The Michigan model for a learning classifier system will be used (one individual is one rule). A rule is a tuple of the form <c, a, p, e, F>, where the letters stand for condition, action, predicted payoff, accuracy, and fitness. The rule evaluation cycle will be done according to the XCS framework described in [ES]. The only action for an individual rule is the diagnosis of TRUE (fault detected). There will be a default rule whose action is the diagnosis of FALSE (no fault detected) which applies only when no other rule has a condition which matches the state of the environment. The environment is the recent message-passing history (a sliding window of length N) from the perspective of one of the processors used in the distributed max-flow algorithm. A rule may diagnose TRUE at any time during the processing of the history file, FALSE only at the end, and a reward will be allocated at the end of processing the entire file. The rule discovery cycle will follow the rule evaluation cycle and will be done along the lines of GP, also described in [ES], because the conditions will be s-expressions represented by parse trees. The terminals will be primitive statements such as xi = mj, where xi is a variable representing the ith message in the window, and mj is a constant message type. The operators in the parse tree will be AND and NOT, because OR can be derived from the other two. The exact method of reproduction and competition has not yet been decided, but more recombination will be used than mutation. Experimental Setup Basically, the goal is to see some improvement in the training phase. A testing phase may be designed later to provide a framework for validating the work and evaluating different parameter sets, but in this initial stage any learning evidenced by the development of a rule set with increasing accuracy of diagnoses of faults will be acceptable. Later, one might want to compare the accuracy of diagnosing different types of faults. There is no formal experimental setup at this time, though. Implementation In progress – trying to have a basic Rule Evalution Cycle implemented by Saturday, December 3, 2005, and Rule Discovery Cycle implemented by Monday, December 5, 2005. Data processing (training) could occur from Tuesday to Thursday. Results and Interpretation None yet – trying to have some results by Thursday, December 8, 2005, and a final draft with interpretation and conclusion by Friday, December 9, 2005. Related Work Austin Armbruster et al. [AGMC] have developed a framework for testing the error-detection capabilities of certain assertions which examine the program state and have obtained very good results. It is unlikely that this effort to use an LCS will match that quality, because it examines only the message history and not the internal state of the system. In some sense, this work could be seen as complementary to the assertionchecking work, not competing with it. Conclusion Waiting for Results and Interpretation, but the author is currently satisfied with the design, and has been encouraged to proceed to the implementation phase, so good results are hoped for. Bibliography [AGMC] A. Armbruster, M. Gosnell, B. McMillin, M. Crow. “Power Transmission Control Using Distributed Max-Flow”, https://svn.umr.edu/research/csfil/trunk/papers/COMPSAC05_AA/Main.pdf [ES] A.E. Eiben, J.E. Smith. Introduction to Evolutionary Computing. SpringerVerlag, Berlin Heidelberg, 2003.