Neurocontroller Design with Rule Extracted by Using Genetic Based Machine Learning and Reinforcement Learning System Hung-Ching Lu and Ta-Hsiung Hung Department of Electrical Engineering, Tatung University No. 40, Sec. 3, Chung Shan North Road, Taipei, 104 Taiwan, Republic of China TEL: (886)-2-25925252 ext. 3470 re-ext 801; FAX: (886)-2-25941371 Abstract: -In this paper, we will integrate genetic based machine learning, reinforcement learning and neural network techniques to develop a no man, no supervisor, no reference model, and complete evolution neural network controller. The primary extracted rules are called micro rules. A further refinement of the extracted rules is achieved by applying additional genetic search and reinforcement to reduce the number of micro rules. This secondary extracted rules are called macro rules. That process results in a smaller set of macro rules which can be used to train a feedforward multilayer preceptron neurocontroller. Key-Words: Genetic algorithm, Machine learning, Reinforcement learning, Neural network 1. Introduction Several designs of automatic control system have been used in any of industry and military applications [1]. The basic objective of a controller is to provide the appropriate input to a plant to obtain the desired output. However, as the complexity of the system increases, the control process becomes very difficulty to design and lack of efficiency. It is convenient to control an object while its input-output relations are known. For this reason, how acquired control rules for any system are very important things. We are interested in finding a control technique without knowing the knowledge of the plant a priori, that can generate the rules of the controller automatically. Here, we will be established the system with extracted control rule by genetic algorithm and reinforcement. As the knowledge is already existed in the rule base, the job of the neural network is not to discover the required control strategies, rather its job is to generalize a control law by implementing the existing rule base. The training rule are extracted by a genetic and reinforcement based machine learning system which we call GRBML system. In general, a genetic assisted reinforcement based machine earning system is a massively parallel rule-based system that continuously interacts with a surrounding environment (plant) [2]. Such a system uses genetic algorithm (GA) [3] for the recombination and searches for rules and a global reward/punish algorithm for rules reinforcement. The learning mechanism uses an adaptive knowledge-based system that receives messages from and transmits consequent actions to the environment. The combination of the genetic based machine learning and the reinforcement algorithm can enhance better rule and delete worse rule. The system that employs a hybrid genetic search and reinforcement strategy way be required the yes/no supervisor and reference model. At first, the produce rules is called micro rule. Refine the micro rules by genetic search and reinforcement to reduce the micro rules are called macro rules. Macro rules can used to train a feedforward multilayer perception neural network controller. However, micro and macro rules are directly used in a look-up table controller. Both controllers are given to show the robustness against noise disturbance and plant parameter variations. 2. The Basic Concepts Genetic Base Machine Learning System Here, we use the common standard version of error backpropagation to train a multilayer feedforward neural network with sigmoid activation function. In this case, the binary rule and the associated input-output pairs are used directly as inputs and targets for the neural network [4]. A classifier system is a machine learning system that learns syntactically simple string rule to guide its performance in an arbitrary environment. In general GBML system [5] consists of two main components: architecture and a learning algorithm and it can be divided by 6 components (see Fig.1). (1) Detector unit: Sense the state of the environment and transform it into a binary message. (2) Classifier unit: It consists of simple binary structured rules that can be evolved and has the following form : IF (condition) THEN (action) The classifier action is taken if the associated condition matches the state of the environment. (3) Credit unit: Every classifier is assigned a quantity called-strength. The credit unit does credit assignment task and adjusts the strengths of the classifiers as a function of their performance during the learning process. (4) GA: An exploration/optimization mechanism. By selecting individual classifiers having high strength off-springs, the GA complements do the action of the credit algorithm. (5) Effectors unit: It’s the trigger of GBML system. It translates the binary-valued action of the classifier which is best matched to the environment state into a real-valued output that in turn controls the environment. In fact, it is similar D/A converter. (6) Message list : These environmental messages are posted to a finite-length message list, where the message may then activation string rules called classifiers when they activated, a classifier posts a message to the message list. These messages may then invoke other classifiers or they may cause an action to be taken through the system’s action triggers called effectors. In this way classifiers combine environmental cues and internal thoughts to determine what the system should do and think next. Apportionment of Credit Algorithm Many GBML systems attempt to rank individual classifiers according to a classifier’s role in achieving reward from the environment. Although there are a number of ways of doing this, the most prevalent method incorporates what Holland has called a bucket brigade algorithm. This service economy contains two main components: an auction and a clearinghouse. When classifiers are matched they do not directly post their messages. Instead, having its condition matched qualifiers a classifier to participate in an activation auction. Then each classifier maintains a record of its net worth, called it strength. Each matched classifier makes a bid proportional to its strength. In this way rules that are highly fit are given preference over other rules. The auction permits appropriate classifiers to be selected to post their messages. Once a classifier is selected for activation, it must clear its payment through the rendered. A matched and activated classifier sends its bid to those classifiers responsible for sending the messages that matched the bidding classifier’s condition. The bid payment is divided in some manner among the matching classifiers. This division of payoff among contributing classifiers helps to ensure the formation of an appropriately sized subpopulation of rules. Thus different types of rules can cover different types of behavioral requirements without undue interspecies competition. Reinforcement Learning Reinforcement learning is the on-line learning of an input-output mapping through a process of trial and error designed to maximize a scalar performance index called a reinforcement signal. Although it cannot be claimed that this principle provides a complete model of biological behavior, its simplicity and common-sense approach have made it be an influential learning rule. Indeed, we may rephrase Thorndike's law of effect to offer the following sensible definition of reinforcement learning [6-8]. If an action taken by a learning system is followed by a satisfactory state of affairs, then the tendency of the system to produce that particular action is strengthened or reinforced. Otherwise, the tendency of the system to produce that action is weakened. Genetic Algorithm Genetic Algorithm (GA) is an exploratory search and optimization procedures of natural evolution and population genetics [9]. GA uses probabilistic rules to make decision. Typically, the GA starts with little or no knowledge of the correct solution depending entirely on responses from an interacting environment and its evolution operators to arrive at good solutions. GA generally consists of three main fundamental operators: reproduction, crossover, and mutation. GA requires the optimization problem to be stated in the form of a cost function and its flow chart is shown in Fig.2. 3. Genetic Reinforcement Based Machine Learning System GRBML system improves GBML system and adds reinforcement algorithm and insertion rule system. The learning process in GRBML is off-line and is implemented by a rule extraction method. Rules are extracted by initializing the plant to a random initial state and then invoking the above reinforcement scheme and GA until a sequence of rules drives the environment to the desired state. Micro Rules Extraction 3. Conclusion GRBML system micro rules extraction scheme can be described as follows: (1) Initialize all parameters of GRBML with values supplied by the user randomly to generate a present number of classifiers in the working population and assign each of them an equal strength. (2) Apply the plant dynamics to compute the state of plant at the next discrete time step. (3) Post the environmental message in binary format to GRBML. (4) The insertion operation is applied at every time step during the first few reinforcement phases for every initial plant state, or after the GA has just been invoked. (5) Effectors release the action of the classifier with the highest strength among the few "closest" classifiers to the environment message of the plant in the next time step. (6) Since the last reinforcement was applied the number of time steps is equal to reinforcement period, then trigger a new reinforcement operation by proceeding to the next step, otherwise go back to step 2. (7) Evaluate the fitness function, selected by the effectors during the last reinforcement period. (8) If the fitness of the above sequence meets a preset threshold, then copy all the classifiers in that sequence to the retrieving population file. Next, if the number of successful learning phases is met, then stop the whole rule extraction process, else reinitialize the environment to some random initial state and go to step 2. (9) If it runs the GA time then it will call GA procedure. (10) If the plant’s state leaves a predefined region of the state space, then re-initialize it to the previous initial state. (11) If the number of reinforcements, for a single initial plant state, exceeds its upper bound then re-initialize the plant to a new random state. (12) Go to step 2. A novel machine learning system, GRBML system for the rule extraction of temporal control problems is introduced. The system uses a hybrid of reinforcement learning and GA search. Based on the rules this system extracted, the designed feedforword neural networks that can successfully control highly nonlinear and noisy system. The rules extracted by the system can be utilized in three ways. First, they can be directly used for retrievals in a table look-up fashion. Second, with further processing, the rules can be refined into a small number of macro rules and then also used in a table look-up method. Last, the macro rules themselves can be used in training feedforward neural networks. The neurocontroller (the macro rule-based neural controller) outperformed both the micro and the macro rules-based table look-up controller in most of the above applications. Moreover, the neurocontrollers show more robustness when controlling non-nominal or noise corrupted plants. Macro Rules Retrievals and Rule Based Neural Controller Although the retrievals using the micro rules may provide the acceptable results, the performance can be improved by further processing of the existing micro rules. It is called macro rules (see Fig.3). In general, BP is used to train a multilayer feedforward neural network with sigmoid activation unit and its flowchart is shown in Fig.4. We use the extracted rules in training a multilayer neural network. Acknowledgment: This work was supported by the National Science Council of the Republic of China under the contract of NSC-91-2213-E036-010. References [1]W. M, Thomas, R. S. Sutton, P. J. Werbos, Neural Nehvorks for Control. Cambridge, MA : MIT Press, 1990. [2]J. H. Holland, "A mathematical framework for studying learning in classifier systems," Physica, vol. 22, pp. 307-311, 1986. [3]D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA : Addison-Wesley, 1989. [4]A. U. Levin and K. S. Narendra, "Control of nonlinear dynamical systems using neural networks : Controllability and stabilization," IEEE Trans. New: Net., vol. 4, pp.192-206, 1993. [5]A. Abdulaziz and M. Farsi, "Non-linear system identification and control based on neural and self-tuning control," Int. J. Contr., vol. 7, no. 6, pp. 297-307, 1993. [6]M. D. Waltz, and K. S. Fu, "A heuristic approach to reinforcement learning control systems" IEEE Trans. Automat. Contr.,vol.10, pp.390-398, 1965. [7]E. L. Thomdike, Animal Intelligence. Darien, CT : Hafner, 1991. [8]R. S..Sutton, , A. G. Barto, and R. J. Williams, "Reinforcement learning is direct adaptive optical control." Proceedings of the American Control Conference, pp. 2143-2146, Boston, MA, 1991. [9]K. Krishnakumar and D. E. Goldberg, "Control system optimization using genetic algorithm," J. Guidance, Contr. And Dynam., vol. 15, no. 3, pp. 735-740, 1992. START New Initial Random State Environment Call GA Learning Classifier System Detectors Information 1 0 Payoff 1 Message List 101 000 111 1 1 1 NO Plant Dynamics Effectors Action Criterion Satisfy? YES Satisfy present number of terms? Rule Memory 10# : 111 00# : 000 NO YES Apportionment of Credit NO Reinforce Draft Rules Genetic Alogrithm YES Fig.1 The learning GBML system interacts with its environment START Reinforcement Algorithm STOP Fig.3 Synthesis of the macro rules START Initialize the population G(0) Macro Rules Evaluate the fitness Train a Feedforward Neural Network with Back Propagation crossover K=K+1 mutation Simulations and Perfromance Evaluations on a Validation set New population G(k+1) NO Satisfied the goal YES STOP Fig.2 The flow chart of GA Vary Network Size and/or Learning parameters NO Design is Acceptable? YES STOP Fig.4 Neural network used by macro rule training