Learning in Multiagent systems Prepared by: Jarosław Szymczak Based on: „Fundamentals of Multiagent Systems with NetLogo Examples” by José M Vidal Scenarios of learning cooperative learning – e.g. each agent has own map, and together with other agents aggregate a global view competitive learning – e.g. each selfish agent tries to maximize own utility by learning about behaviors and weaknesses of other agents agents learn because: they don’t know everything about the environment they don’t know how the other agents behave The Machine Learning Problem The goal of machine learning research is the development of algorithms that increase the ability of an agent to match a set of inputs to their corresponding outputs (Mitchell, 1997) Input here could be e.g. a set of photos depicting people and output would be set consisting of {man, woman}, the machine learning algorithm will have to learn a proper recognition of photos The Machine Learning Problem Input set is usually divided into training and testing set, they can be interleaved Graphical representation of MLP: The Machine Learning Problem Induction bias – some learning algorithms appear to perform better than others in certain domains (e.g. two algorithms can learn perfectly classify + and -, but still have different functions) No free lunch theorem - averages over all possible learning problems there is no learning algorithm that outperforms all others In multiagent scenario some of the fundamental assumptions of machine learning are violated, there is no longer fixed input, it keeps changing because other agents are also learning Cooperative learning We have given a two robots able to communicate, they can share their knowledge (their capabilities, knowledge about terrain etc.) Sharing the information is really easy if robots are identical, if not we need to somehow model their capabilities to decide which information would be useful for another robot Most systems that share learned knowledge among agents, such as (Stone, 2000), simply assume that all agents have the same capabilities. Repeated games Nash equilibrium – I choose what is best for me when you are doing what you are doing and you choose what is best for you when I am doing what I am doing In repeated games we have two players facing each other, like e.g. prisoner's dilemma Nash equilibrium is based on assumption of perfectly rational players, in learning in games the assumption is that agents use some kind of algorithm, the theory determines the equilibrium strategy that will be arrived at by the various learning mechanisms and maps these equilibrium to the standard solution concepts, if possible. Fictitious play Agent remembers everything the other agents have done. E.g.: Fictitious play Let’s have a look at some theorems: (Nash Equilibrium is Attractor to Fictitious Play). If s is a strict Nash equilibrium and it is played at time t then it will be played at all times greater than t (Fudenberg and Kreps, 1990). (Fictitious Play Converges to Nash). If fictitious play converges to a pure strategy then that strategy must be a Nash equilibrium (Fudenberg and Kreps, 1990). Fictitious play Infinite cycles problem – we can avoid it by using randomness, here is the example of infinite cycle in fictitious play: Replicator dynamics This model assumes that the fraction of agents playing a particular strategy replicator dynamics will grow in proportion to how well that strategy performs in the population. A homogeneous population of agents is assumed. The agents are randomly paired in order to play a symmetric game (same strategies and payoffs). It is inspired by biological evolution. Let φt(s) be a number of agents using strategy s at time t, ut(s) be an expected utility for an agent playing strategy s at time t and u(s,s’) be utility that agant playing s receives against agent playing s’. We can define: Replicator dynamics Let’s have a look at some theorems: (Nash equilibrium is a Steady State). Every Nash equilibrium is a steady state for the replicator dynamics (Fudenberg and Levine, 1998). (Stable Steady State is a Nash Equilibrium). A stable steady state of the replicator dynamics is a Nash equilibrium. A stable steady state is one that, after suffering from a small perturbation, is pushed back to the same steady state by the system’s dynamics (Fudenberg and Levine, 1998). (Asymptotically Stable is Trembling-Hand Nash). An asymptotically stable steady state corresponds to a Nash equilibrium that is trembling-hand perfect and isolated. That is, the stable steady states are a refinement on Nash equilibria only a few Nash equilibria are stable steady states (Bomze, 1986). Evolutionary stable strategy An ESS is an equilibrium strategy that can overcome the presence of a small number of invaders. That is, if the equilibrium strategy profile is ω and small number ε of invaders start playing ω’ then ESS states that the existing population should get a higher payoff against the new mixture (εω‘+(1−ε)ω) than the invaders. (ESS is Steady State of Replicator Dynamics). ESS is an asymptotically stable steady state of the replicator dynamics. However, the converse need not be true—a stable state in the replicator dynamics does not need to be an ESS (Taylor and Jonker, 1978). Replicator dynamics AWESOME algorithm The abbreviation stands for: Adapt When Every is Stationary, Otherwise Move to Equilibrium Stochastic games COMING SOON (THIS AUTUMN)