The Particle Swarm: Theme and Variations on Computational Social Learning

advertisement
The Particle Swarm:
Theme and Variations on
Computational
Social Learning
James Kennedy
Washington, DC
Kennedy.Jim@gmail.com
The Particle Swarm
A stochastic, population-based algorithm for problem-solving
Based on a social-psychological metaphor
Used by engineers, computer scientists, applied mathematicians, etc.
First reported in 1995 by Kennedy and Eberhart
Constantly evolving
The Particle Swarm Paradigm is a Particle Swarm
A kind of program comprising a population of individuals that interact with
one another according to simple rules in order to solve problems, which
may be very complex.
It is an appropriate kind of
description of the process of
science.
Two Spaces
The social network topology, and
The state of the individual as a point in a Cartesian coordinate system
Moving point = a particle
A bunch of them = a swarm
Note memes:
“evolution of ideas”
vs.
“Changes in people
who hold ideas”
Minimizing or maximizing a
function result by adjusting
parameters
Cognition as Optimization
Cognitive consistency theories, incl. dissonance
Parallel constraint satisfaction
Feedforward Neural Nets
Particle swarm describes the dynamics of
the network, as opposed to its equilibrium
properties
Mind and Society
Minsky: Minds are simply what brains do.
No: minds are what socialized human brains do.
… solipsism …
Dynamic Social Impact Theory
Nowak, A., Szamrej, J., & Latané, B. (1990). From
private attitude to public opinion: A dynamic theory of
social impact. Psychological Review, 97, 362-376.
i=f(SIN)
Computer simulation – 2-d CA
Each individual is both a target
and source of influence
“Euclidean” neighborhoods
Binary, univariate individuals
“Strength” randomly assigned
Polarization
Particle Swarms: The Population
To understand the particle swarm, you have to understand the
interactions of the particles
One particle is the stupidest thing in the world
The population learns
Every particle is a teacher and a learner
Social learning, norms, conformity, group dynamics, social influence,
persuasion, self-presentation, cognitive dissonance, symbolic
interactionism, cultural evolution …
Neighborhoods:
Population topology
Gbest
Lbest
(All N=20)
Particles learn from one another. Their
communication structure determines how
solutions propagate through the population.
von Neumann (“Square”) Topology
Regular, easy to understand, works adequately with a variety of versions –
perhaps not the best for any version, but not the worst.
The Particle Swarm: Pseudocode
Initialize Population and constants
Repeat
Do i=1 to population size

CurrentEvali = eval( x i)
If CurrentEval < pbesti then do
pbesti = CurrentEvali
For d=1 to Dimension
pid =xid
Next d
If CurrentEval i < Pbestgbest then gbest=i
End if
g = best neighbor’s index
For d=1 to Dimension
vid = W*vid + U(0, AC) × (pid – xid) + U(0, AC) × (pgd – xid)
xid = xid + vid
Next d
Next i
Until termination criterion
Pseudocode Chunk #1

CurrentEvali = eval( x i)
If CurrentEvali < pbesti then do
pbesti = CurrentEvali
For d=1 to Dimension
pid =xid
Next d
If CurrentEval i < pbestgbest then gbest=i
End if
Can come at top or bottom of the loop
In gbest topology, g=gbest
It’s useful to track the population best
Pseudocode Chunk #2
g = best neighbor’s index
For d=1 to Dimension
vid = W*vid + U(0, AC) × (pid – xid) + U(0, AC) × (pgd – xid)
xid = xid + vid
Next d
Constriction Type 1”: W=0.7298, AC=W*2.05=1.496
Clerc 2006; W=0.7, AC=1.43
Inertia: W might vary with time, in (0.4, 0.9), etc., AC=2.0 typically
Note three components of velocity
Vmax - not necessary, might help
Q: Are these formulas arbitrary?
Some Standard Test Functions
Sphere
Rosenbrock
Rastrigin
Schaffer’s f6
Griewank
Test Functions: Typical Results
Step-Size Depends on Neighbors
pi=0
pg=0
Movement of the particle through the
search space is centered on the
mean of pi and pg on each dimension,
and its amplitude is scaled to their
difference.
pi=+2
pg=-2
Exploration vs. exploitation: automatic
pi=+0.1
pg=-0.1
… “the box” …
Search distribution – Scaled to Neighborhood
Previous bests constant
at +10
A million iterations
Q: What is the distribution of points that are tested by the particle?
“Bare Bones” particle swarm
x = G((pi + pg)/2, abs(pi – pg))
G(mean, s.d.) is Gaussian RNG
Simplified (!)
Works pretty well, but not as good as canonical.
Kurtosis
Peaked -- fat tails
Tails trimmed
Empirical observations
with p’s held constant
Not trimmed
Kurtosis
High peak, fat tails
Mean moments of the canonical particle swarm algorithm with
previous bests set at +20, varying the number of iterations.
Iterations
Mean
S.D.
Skewness
Kurtosis
1,000
0.0970
37.7303
-0.0617
8.008
3,000
0.0214
41.5281
0.0814
18.813
10,000
-0.0080
41.6614
-0.0679
40.494
100,000
0.0022
41.7229
0.2116
170.204
1,000,000
0.0080
41.3048
0.3808
342.986
Bursts of Outliers
50
40
30
20
10
0
-10
-20
-30
-40
-50
-60
“Volatility clustering” seems to typify
the particle’s trajectory
Adding Bursts of Outliers to Bare Bones PSO
Sphere
Griewank30
5
20
4
10
3
0
2
1
-10
0
-20
Center = (pid + pgd)/2
SD = |pid - pgd|
xid = G(0,1)
if Burst = 0 and U(0,1)<
PBurstStart then
Burst = U(0, maxpower)
Else If Burst > 0 and
U(0,1)< PBurstEnd then
Burst = 0
End If
If Burst > 0 then xid = xid ^ Burst
xid = Center + xid * SD
-1
-30
-2
-40
-3
-50
-4
-5
-60
Rosenbrock
Griewank10
17
0.4
15
-0.1
13
-0.6
11
-1.1
9
-1.6
7
-2.1
5
-2.6
3
-3.1
Rastrigin
f6
-4
5.7
-4.5
5.5
-5
5.3
5.1
4.9
-5.5
-6
-6.5
4.7
-7
4.5
4.3
-7.5
-8
4.1
(Bubbled line is canonical PS)
“The Box”
Where the particle goes next depends on which way it was already
going, the random numbers, and the sign and magnitude of the
differences.
vid = W*vid +
U(0, AC) × (pid – xid) +
U(0, AC) × (pgd – xid)
xid = x id + v id
The area where it can go it crisply bounded,
but the probability inside the box is not uniformly dense.
Empirical distribution of means of random
numbers from different ranges
Simulate with uniform RNG, trim tails
TUPS: Truncated-Uniform Particle Swarm
Start at current position: x(t)
Move weighted amount same direction: W1× (x(t) – x(t-1))
Find midpoint of extremes, and difference between them, on each dimension
(the sides of the box)
Weight that, add it to where you are, call it the “center”
Generate uniformly distributed random number around the center,
range slightly less than the length of the side
That’s x(t+1)
TUPS: Truncated-Uniform Particle Swarm
x(t+1)=x(t) +
W1× (x(t) – x(t-1)) +
W2 × ((U(-1,+1) × (width/2.5)) +
center)
Sphere
Rastrigin
2.6
2
2.4
-3
Canonical
-8
TUPS
FIPS
-13
-18
W1=0.729; W2=1.494
1.4
1000
2000
0
3000
Griewank30
Generates a point less than
Width/2 from the center
500
1000
1500
2000
2500
3000
500
1000
1500
2000
2500
3000
500
1000
1500
2000
2500
3000
Rosenbrock
2.2
7.4
1.2
Center is width/2
2
1.8
1.6
-23
0
Width is difference between
highest and lowest (p-x)
2.2
6.4
0.2
5.4
-0.8
4.4
-1.8
3.4
-2.8
2.4
-3.8
1.4
0
500
1000
1500
2000
2500
3000
0
f6
Griewank10
-2
0
-2.2
-2.4
-0.5
-2.6
-1
-2.8
-1.5
-3
-3.2
-2
0
500
1000
1500
2000
2500
3000
0
Binary Particle Swarms
S(x) = 1 / (1 + exp(-x))
Transform velocity with
sigmoid function in (0 .. 1)
v = [the usual]
if rand() < S(v) then x = 1
else x = 0
Use it as a probability
threshold
Though this is a radically different concept, the principles of particle
interaction still apply (because the power is in the interactions).
FIPS -- The “fully-informed” particle swarm
(Rui Mendes)
v(t+1) = W1 × v(t) + Σ(rand() × W2/K × (pk – x(t)))
x(t+1)=x(t)+v(t+1)
(K=number of neighbors, k=index of neighbor, W2 is a sum.)
Note that pi is not a source of influence in FIPS.
Doesn’t select best neighbor.
Orbits around the mean of neighborhood bests.
This version is more dependent on topology.
Deconstructing Velocity
Because x(t+1) = x(t) + v(t+1)
we know that on the previous iteration,
x(t) = x(t-1) + v(t)
So we can find v(t)
v(t) = x(t) – x(t-1)
and can substitute that into the formula, to put it all in one line:
xid (t+1) = xid (t) +
W1(xid (t) - xid(t-1)) +
(rand()×(W2)×(pid - xid(t)) +
(rand()×(W2)×(pgd - xid(t))
Generalization and Verbal Representation
We can generalize the canonical and FIPS versions:
xid (t+1) =
xid (t) +
W1(xid (t)- xid (t-1)) +
Σ(rand()×(W2/K)×(pkd - xid(t)))
… or in words …
NEW POSITION =
CURRENT POSITION +
PERSISTENCE +
SOCIAL INFLUENCE
Social Influence
has two components:
Central Tendency, and Dispersion
NEW POSITION=
CURRENT POSITION +
PERSISTENCE +
SOCIAL CENTRAL TENDENCY +
SOCIAL DISPERSION
… Hmmm, this gives us something to play with …!
Gaussian “Essential” Particle Swarm
Note that only the last term has randomness in it – the rest is deterministic
meanp=(pid + pgd)/2
disp=abs(pid – pgd)/2
xid (t+1)=
xid (t) +
W1(xid (t)- xid (t-1)) +
W2*(meanp – xid) +
G(0,1)*disp
G(0,1) is a Gaussian RNG
NEW POSITION=
CURRENT POSITION +
PERSISTENCE +
SOCIAL CENTRAL TENDENCY +
SOCIAL DISPERSION
Gaussian Essential Particle Swarm
xid (t+1)=
xid (t) +
W1(xid (t)- xid (t-1)) +
W2*(meanp – xid) +
G(0,1)*disp
Function
Trials
Canonical
Gaussian
F6
20
0.0015
13E-10
GRIEWANK
20
0.0086
0.0103
GRIEWANK10
20
0.0508
0.045
RASTRIGIN
20
56.862
49.151
ROSENBROCK
20
41.836
44.197
SPHERE
20
88E-16
38E-24
Various Probability Distributions
Function
Trials
Canonical
Gaussian
Triangular
DoubleExponential
Cauchy
F6
20
0.0015
13E-10
0.0057
0.001
0.0029
GRIEWANK
20
0.0086
0.0103
0.0275
0.0149
0.0253
GRIEWANK10
20
0.0508
0.045
0.0694
0.0541
0.0768
RASTRIGIN
20
56.862
49.151
140.94
47.26
33.829
ROSENBROCK
20
41.836
44.197
67.894
41.308
42.054
SPHERE
20
88E-16
38E-24
1.70E-18
2.40E-22
1.60E-17
There is clearly room for exploration here.
Gaussian FIPS
Fully-informed – uses all neighbors
FIPScenter= mean of (pkd – xid)
FIPSrange = mean of abs(pid - pkd)
xid= xid +
W1 × (xid(t)-xid(t-1)) +
W2 × FIPScenter +
G(0,1) × (FIPSrange/2);
Gaussian FIPS
Gaussian FIPS compared to Canonical PSO, square topology.
Function
Trials
Canonical
Gaussian
FIPS
F6
20
0.0015
0.001
GRIEWANK
20
0.0086
0.0007
GRIEWANK10
20
0.0508
0.0215
RASTRIGIN
20
56.862
36.858
ROSENBROCK
20
41.836
40.365
SPHERE
20
88E-16
41E-29
Gaussian FIPS vs. Canonical PSO
Modified Bonferroni correction
func
t
t(38), alpha=0.05
p-value
rank
inv.
rank
New
alpha
Sig.
SPHERE
3.17
0.0030
1
6
0.008333
GRIEWANK10
2.99
0.0048
2
5
0.010000
*
*
GRIEWANK
2.64
0.0118
3
4
0.012500
*
RASTRIGIN
2.21
0.0333
4
3
0.016667
.
F6
0.45
0.6583
5
2
0.025000
.
ROSENBROCK 0.15
0.8782
6
1
0.050000
.
Ref: Jaccard, J. & Wan, C. K. (1996). LISREL approaches to interaction
effects in multiple regression. Thousand Oaks, CA: Sage Publications.
Mendes: Two Measures of Performance
Color and shape indicate parameters of the social network –
degree, clustering, etc.
Best Topologies
FIPS versions
Best-neighbor versions
Worst FIPS Sociometries
and Proportions Successful
Understanding the Particle Swarm
Lots of variations in particle movement formulas
Teachers and learners
Propagation of knowledge is central
Interaction method and social network topology … interact
It’s simple, but difficult to understand
In Sum
There is some fundamental process
• Uses information from neighbors
• Seems to require balance between “persistence” and “influence”
Decomposing a version we know is OK
• We can understand it
• We can improve it
Particle Trajectories
• Arbitrary? -- not quite
• Can be replaced by RNG (trajectory is not the essence)
How it works
• It works by sharing successes among individuals
• Need to look more closely at the sharing phenomenon itself
Send me a note
Jim Kennedy
Kennedy.Jim@gmail.com
Download