The Particle Swarm: Theme and Variations on Computational Social Learning James Kennedy Washington, DC Kennedy.Jim@gmail.com The Particle Swarm A stochastic, population-based algorithm for problem-solving Based on a social-psychological metaphor Used by engineers, computer scientists, applied mathematicians, etc. First reported in 1995 by Kennedy and Eberhart Constantly evolving The Particle Swarm Paradigm is a Particle Swarm A kind of program comprising a population of individuals that interact with one another according to simple rules in order to solve problems, which may be very complex. It is an appropriate kind of description of the process of science. Two Spaces The social network topology, and The state of the individual as a point in a Cartesian coordinate system Moving point = a particle A bunch of them = a swarm Note memes: “evolution of ideas” vs. “Changes in people who hold ideas” Minimizing or maximizing a function result by adjusting parameters Cognition as Optimization Cognitive consistency theories, incl. dissonance Parallel constraint satisfaction Feedforward Neural Nets Particle swarm describes the dynamics of the network, as opposed to its equilibrium properties Mind and Society Minsky: Minds are simply what brains do. No: minds are what socialized human brains do. … solipsism … Dynamic Social Impact Theory Nowak, A., Szamrej, J., & Latané, B. (1990). From private attitude to public opinion: A dynamic theory of social impact. Psychological Review, 97, 362-376. i=f(SIN) Computer simulation – 2-d CA Each individual is both a target and source of influence “Euclidean” neighborhoods Binary, univariate individuals “Strength” randomly assigned Polarization Particle Swarms: The Population To understand the particle swarm, you have to understand the interactions of the particles One particle is the stupidest thing in the world The population learns Every particle is a teacher and a learner Social learning, norms, conformity, group dynamics, social influence, persuasion, self-presentation, cognitive dissonance, symbolic interactionism, cultural evolution … Neighborhoods: Population topology Gbest Lbest (All N=20) Particles learn from one another. Their communication structure determines how solutions propagate through the population. von Neumann (“Square”) Topology Regular, easy to understand, works adequately with a variety of versions – perhaps not the best for any version, but not the worst. The Particle Swarm: Pseudocode Initialize Population and constants Repeat Do i=1 to population size CurrentEvali = eval( x i) If CurrentEval < pbesti then do pbesti = CurrentEvali For d=1 to Dimension pid =xid Next d If CurrentEval i < Pbestgbest then gbest=i End if g = best neighbor’s index For d=1 to Dimension vid = W*vid + U(0, AC) × (pid – xid) + U(0, AC) × (pgd – xid) xid = xid + vid Next d Next i Until termination criterion Pseudocode Chunk #1 CurrentEvali = eval( x i) If CurrentEvali < pbesti then do pbesti = CurrentEvali For d=1 to Dimension pid =xid Next d If CurrentEval i < pbestgbest then gbest=i End if Can come at top or bottom of the loop In gbest topology, g=gbest It’s useful to track the population best Pseudocode Chunk #2 g = best neighbor’s index For d=1 to Dimension vid = W*vid + U(0, AC) × (pid – xid) + U(0, AC) × (pgd – xid) xid = xid + vid Next d Constriction Type 1”: W=0.7298, AC=W*2.05=1.496 Clerc 2006; W=0.7, AC=1.43 Inertia: W might vary with time, in (0.4, 0.9), etc., AC=2.0 typically Note three components of velocity Vmax - not necessary, might help Q: Are these formulas arbitrary? Some Standard Test Functions Sphere Rosenbrock Rastrigin Schaffer’s f6 Griewank Test Functions: Typical Results Step-Size Depends on Neighbors pi=0 pg=0 Movement of the particle through the search space is centered on the mean of pi and pg on each dimension, and its amplitude is scaled to their difference. pi=+2 pg=-2 Exploration vs. exploitation: automatic pi=+0.1 pg=-0.1 … “the box” … Search distribution – Scaled to Neighborhood Previous bests constant at +10 A million iterations Q: What is the distribution of points that are tested by the particle? “Bare Bones” particle swarm x = G((pi + pg)/2, abs(pi – pg)) G(mean, s.d.) is Gaussian RNG Simplified (!) Works pretty well, but not as good as canonical. Kurtosis Peaked -- fat tails Tails trimmed Empirical observations with p’s held constant Not trimmed Kurtosis High peak, fat tails Mean moments of the canonical particle swarm algorithm with previous bests set at +20, varying the number of iterations. Iterations Mean S.D. Skewness Kurtosis 1,000 0.0970 37.7303 -0.0617 8.008 3,000 0.0214 41.5281 0.0814 18.813 10,000 -0.0080 41.6614 -0.0679 40.494 100,000 0.0022 41.7229 0.2116 170.204 1,000,000 0.0080 41.3048 0.3808 342.986 Bursts of Outliers 50 40 30 20 10 0 -10 -20 -30 -40 -50 -60 “Volatility clustering” seems to typify the particle’s trajectory Adding Bursts of Outliers to Bare Bones PSO Sphere Griewank30 5 20 4 10 3 0 2 1 -10 0 -20 Center = (pid + pgd)/2 SD = |pid - pgd| xid = G(0,1) if Burst = 0 and U(0,1)< PBurstStart then Burst = U(0, maxpower) Else If Burst > 0 and U(0,1)< PBurstEnd then Burst = 0 End If If Burst > 0 then xid = xid ^ Burst xid = Center + xid * SD -1 -30 -2 -40 -3 -50 -4 -5 -60 Rosenbrock Griewank10 17 0.4 15 -0.1 13 -0.6 11 -1.1 9 -1.6 7 -2.1 5 -2.6 3 -3.1 Rastrigin f6 -4 5.7 -4.5 5.5 -5 5.3 5.1 4.9 -5.5 -6 -6.5 4.7 -7 4.5 4.3 -7.5 -8 4.1 (Bubbled line is canonical PS) “The Box” Where the particle goes next depends on which way it was already going, the random numbers, and the sign and magnitude of the differences. vid = W*vid + U(0, AC) × (pid – xid) + U(0, AC) × (pgd – xid) xid = x id + v id The area where it can go it crisply bounded, but the probability inside the box is not uniformly dense. Empirical distribution of means of random numbers from different ranges Simulate with uniform RNG, trim tails TUPS: Truncated-Uniform Particle Swarm Start at current position: x(t) Move weighted amount same direction: W1× (x(t) – x(t-1)) Find midpoint of extremes, and difference between them, on each dimension (the sides of the box) Weight that, add it to where you are, call it the “center” Generate uniformly distributed random number around the center, range slightly less than the length of the side That’s x(t+1) TUPS: Truncated-Uniform Particle Swarm x(t+1)=x(t) + W1× (x(t) – x(t-1)) + W2 × ((U(-1,+1) × (width/2.5)) + center) Sphere Rastrigin 2.6 2 2.4 -3 Canonical -8 TUPS FIPS -13 -18 W1=0.729; W2=1.494 1.4 1000 2000 0 3000 Griewank30 Generates a point less than Width/2 from the center 500 1000 1500 2000 2500 3000 500 1000 1500 2000 2500 3000 500 1000 1500 2000 2500 3000 Rosenbrock 2.2 7.4 1.2 Center is width/2 2 1.8 1.6 -23 0 Width is difference between highest and lowest (p-x) 2.2 6.4 0.2 5.4 -0.8 4.4 -1.8 3.4 -2.8 2.4 -3.8 1.4 0 500 1000 1500 2000 2500 3000 0 f6 Griewank10 -2 0 -2.2 -2.4 -0.5 -2.6 -1 -2.8 -1.5 -3 -3.2 -2 0 500 1000 1500 2000 2500 3000 0 Binary Particle Swarms S(x) = 1 / (1 + exp(-x)) Transform velocity with sigmoid function in (0 .. 1) v = [the usual] if rand() < S(v) then x = 1 else x = 0 Use it as a probability threshold Though this is a radically different concept, the principles of particle interaction still apply (because the power is in the interactions). FIPS -- The “fully-informed” particle swarm (Rui Mendes) v(t+1) = W1 × v(t) + Σ(rand() × W2/K × (pk – x(t))) x(t+1)=x(t)+v(t+1) (K=number of neighbors, k=index of neighbor, W2 is a sum.) Note that pi is not a source of influence in FIPS. Doesn’t select best neighbor. Orbits around the mean of neighborhood bests. This version is more dependent on topology. Deconstructing Velocity Because x(t+1) = x(t) + v(t+1) we know that on the previous iteration, x(t) = x(t-1) + v(t) So we can find v(t) v(t) = x(t) – x(t-1) and can substitute that into the formula, to put it all in one line: xid (t+1) = xid (t) + W1(xid (t) - xid(t-1)) + (rand()×(W2)×(pid - xid(t)) + (rand()×(W2)×(pgd - xid(t)) Generalization and Verbal Representation We can generalize the canonical and FIPS versions: xid (t+1) = xid (t) + W1(xid (t)- xid (t-1)) + Σ(rand()×(W2/K)×(pkd - xid(t))) … or in words … NEW POSITION = CURRENT POSITION + PERSISTENCE + SOCIAL INFLUENCE Social Influence has two components: Central Tendency, and Dispersion NEW POSITION= CURRENT POSITION + PERSISTENCE + SOCIAL CENTRAL TENDENCY + SOCIAL DISPERSION … Hmmm, this gives us something to play with …! Gaussian “Essential” Particle Swarm Note that only the last term has randomness in it – the rest is deterministic meanp=(pid + pgd)/2 disp=abs(pid – pgd)/2 xid (t+1)= xid (t) + W1(xid (t)- xid (t-1)) + W2*(meanp – xid) + G(0,1)*disp G(0,1) is a Gaussian RNG NEW POSITION= CURRENT POSITION + PERSISTENCE + SOCIAL CENTRAL TENDENCY + SOCIAL DISPERSION Gaussian Essential Particle Swarm xid (t+1)= xid (t) + W1(xid (t)- xid (t-1)) + W2*(meanp – xid) + G(0,1)*disp Function Trials Canonical Gaussian F6 20 0.0015 13E-10 GRIEWANK 20 0.0086 0.0103 GRIEWANK10 20 0.0508 0.045 RASTRIGIN 20 56.862 49.151 ROSENBROCK 20 41.836 44.197 SPHERE 20 88E-16 38E-24 Various Probability Distributions Function Trials Canonical Gaussian Triangular DoubleExponential Cauchy F6 20 0.0015 13E-10 0.0057 0.001 0.0029 GRIEWANK 20 0.0086 0.0103 0.0275 0.0149 0.0253 GRIEWANK10 20 0.0508 0.045 0.0694 0.0541 0.0768 RASTRIGIN 20 56.862 49.151 140.94 47.26 33.829 ROSENBROCK 20 41.836 44.197 67.894 41.308 42.054 SPHERE 20 88E-16 38E-24 1.70E-18 2.40E-22 1.60E-17 There is clearly room for exploration here. Gaussian FIPS Fully-informed – uses all neighbors FIPScenter= mean of (pkd – xid) FIPSrange = mean of abs(pid - pkd) xid= xid + W1 × (xid(t)-xid(t-1)) + W2 × FIPScenter + G(0,1) × (FIPSrange/2); Gaussian FIPS Gaussian FIPS compared to Canonical PSO, square topology. Function Trials Canonical Gaussian FIPS F6 20 0.0015 0.001 GRIEWANK 20 0.0086 0.0007 GRIEWANK10 20 0.0508 0.0215 RASTRIGIN 20 56.862 36.858 ROSENBROCK 20 41.836 40.365 SPHERE 20 88E-16 41E-29 Gaussian FIPS vs. Canonical PSO Modified Bonferroni correction func t t(38), alpha=0.05 p-value rank inv. rank New alpha Sig. SPHERE 3.17 0.0030 1 6 0.008333 GRIEWANK10 2.99 0.0048 2 5 0.010000 * * GRIEWANK 2.64 0.0118 3 4 0.012500 * RASTRIGIN 2.21 0.0333 4 3 0.016667 . F6 0.45 0.6583 5 2 0.025000 . ROSENBROCK 0.15 0.8782 6 1 0.050000 . Ref: Jaccard, J. & Wan, C. K. (1996). LISREL approaches to interaction effects in multiple regression. Thousand Oaks, CA: Sage Publications. Mendes: Two Measures of Performance Color and shape indicate parameters of the social network – degree, clustering, etc. Best Topologies FIPS versions Best-neighbor versions Worst FIPS Sociometries and Proportions Successful Understanding the Particle Swarm Lots of variations in particle movement formulas Teachers and learners Propagation of knowledge is central Interaction method and social network topology … interact It’s simple, but difficult to understand In Sum There is some fundamental process • Uses information from neighbors • Seems to require balance between “persistence” and “influence” Decomposing a version we know is OK • We can understand it • We can improve it Particle Trajectories • Arbitrary? -- not quite • Can be replaced by RNG (trajectory is not the essence) How it works • It works by sharing successes among individuals • Need to look more closely at the sharing phenomenon itself Send me a note Jim Kennedy Kennedy.Jim@gmail.com