Particle Swarm Optimization

advertisement
Particle Swarm Optimization
A/Prof. Xiaodong Li
School of Computer Science and IT, RMIT University
Melbourne, Australia
Email: xiaodong.li@rmit.edu.au
Nov 2013
Outline







Background on Swarm Intelligence
Introduction to Particle Swarm Optimization (PSO)
 Original PSO, Inertia weight, constriction coefficient
Particle Trajectories
 Simplified PSO; one or two particles
Convergence aspects
FIPS, Bare-bones, and other PSO variants
Communication topologies
Further information
21/03/2016
2
Swarm Intelligence
21/03/2016
3
Swarm Intelligence
Swarm intelligence (SI) is an artificial intelligence technique based
around the study of collective behavior in decentralized, self-organized
systems.
SI systems are typically made up of a population of simple agents
interacting locally with one another and with their environment. Although
there is normally no centralized control structure dictating how individual
agents should behave, local interactions between such agents often lead
to the emergence of global behavior. Examples of systems like this can
be found in nature, including ant colonies, bird flocking, animal herding,
bacteria molding and fish schooling (from Wikipedia).
21/03/2016
4
Swarm Intelligence
Mind is social…
Human intelligence results from social interaction:
Evaluating, comparing, and imitating one another, learning from experience and
emulating the successful behaviours of others, people are able to adapt to
complex environments through the discovery of relatively optimal patterns of
attitudes, beliefs, and behaviours (Kennedy & Eberhart, 2001).
Culture and cognition are inseparable consequences of human sociality:
Culture emerges as individuals become more similar through mutual social
learning. The sweep of culture moves individuals toward more adaptive
patterns of thought and behaviour.
To model human intelligence, we should model
individuals in a social context, interacting with
one another.
21/03/2016
5
Particle Swarm Optimization
The inventors:
James Kennedy
21/03/2016
Russell Eberhart
6
Particle Swarm Optimization
PSO has its roots in Artificial Life and social psychology, as well as
engineering and computer science.
The particle swarms in some way are closely related to
cellular automata (CA):
a) individual cell updates are done in parallel
Blinker
b) each new cell value depends only on the old values of
the cell and its neighbours, and
c) all cells are updated using the same rules (Rucker,
1999).
Glider
Individuals in a particle swarm can be conceptualized as cells in a CA,
whose states change in many dimensions simultaneously.
21/03/2016
7
Particle Swarm Optimization
As described by the inventers James
Kennedy and Russell Eberhart, “particle
swarm algorithm imitates human (or insects)
social behaviour. Individuals interact with one
another while learning from their own
experience, and gradually the population
members move into better regions of the
problem space”.
Why named as “particle”, not “points”? Both
Kennedy and Eberhart felt that velocities and
accelerations are more appropriately applied
to particles.
21/03/2016
8
Particle Swarm Optimization
As described by the inventers James
Kennedy and Russell Eberhart, “particle
swarm algorithm imitates human (or insects)
social behaviour. Individuals interact with one
another while learning from their own
experience, and gradually the population
members move into better regions of the
problem space”.
Why named as “particle”, not “points”? Both
Kennedy and Eberhart felt that velocities and
accelerations are more appropriately applied
to particles.
21/03/2016
9
PSO applications
Problems with continuous, discrete, or mixed search
space, with multiple local minima; problems with
constraints; multiobjective, dynamic optimization.
 Evolving neural networks:
•
•
•
•
•
Human tumor analysis;
Computer numerically controlled milling optimization;
Battery pack state-of-charge estimation;
Real-time training of neural networks (Diabetes among Pima Indians);
Servomechanism (time series prediction optimizing a neural network);
 Reactive power and voltage control;
 Ingredient mix optimization;
 Pressure vessel (design a container of compressed air, with many
constraints);
 Compression spring (cylindrical compression spring with certain
mechanical characteristics);
 Moving Peaks (multiple peaks dynamic environment); and more
PSO can be tailor-designed to deal with specific real-world problems.
21/03/2016
10
Original PSO


 
 


vi  vi  1  ( pi  xi )   2  ( p g  xi )

 
xi  xi  vi

xi denotes the current position of the i–th particle in the swarm;

vi denotes the velocity of the i-th particle;

pi the best position found by the i-th particle so far, i.e., personal best;

pg the best position found from the particle’s neighbourhood, i.e., global best;
The symbol  denotes a point-wise vector multiplication;



1  c1r1 and 2  c2 r2 ;


r1
and

r2
are two vectors of random numbers uniformly chosen from [0, 1];
c1and c2 are acceleration coefficients.
21/03/2016
11
Original PSO
momentum
cognitive
component
social
component


 
 


vi  vi  1  ( pi  xi )   2  ( p g  xi )

vi

 
xi  xi  vi
Velocity
(which denotes the amount of change) of the i-th particle is
determined by three components:
 momentum – previous velocity term to carry the particle in the direction it
has travelled so far;
 cognitive component – tendency to return to the best position visited so far;
 social component – tendency to be attracted towards the best position found
in its neighbourhood.
Neighbourhood topologies can be used to control information propagation
between particles, e.g., ring, star, or von Neumann. lbest and gbest PSOs.
21/03/2016
12
Pseudo-code of a basic PSO
Randomly generate an initial population
repeat
for i = 1 to population_size do

if f( xi ) < f(

p i)
then


pg = min( p neighbours );

pi
=

xi ;
for d =1 to dimensions do
velocity_update();
position_update();
end
end
until termination criterion is met.
21/03/2016
13
Inertia weight



p
p
The i and pg can be collapsed into a single term without losing any
information:

 
 
vi  vi    ( p  xi )
where


 
xi  xi  vi


  1   2 and




 1  p i   2  p g
p
 
1   2



p represents the weighted average of pi and p g . Note that the division operator
is a point-wise vector division.
Since the velocity term tends to keep the particle moving in the same direction as of
its previous flight, a coefficient inertia weight, w, can be used to control this influence:


 
 


vi  w vi  1  ( pi  xi )   2  ( p g  xi )
The inertia weighted PSO can converge under certain conditions without using
Vmax.
21/03/2016
14
Inertia weight
The inertia weight can be used to control exploration and exploitation:
For w ≥ 1: velocities increase over time, swarm diverge;
For 0 < w < 1: particles decelerate; convergence depends on value for c1 and c2;
For w < 0: velocities decrease over time, eventually reaching 0; convergence behaviour.
Empirical results suggest that a constant inertia weight w = 0.7298 and
c1=c2=1.49618 provide good convergence behaviour.
Eberhart and Shi also suggested to use the inertia weight which decreasing
over time, typically from 0.9 to 0.4. It has the effect of narrowing the search,
gradually changing from an exploratory to an exploitative mode.
21/03/2016
15
Visualizing PSO

vi



1  ( p g  xi )



2  ( pi  xi )

xi (updated)

vi

xi

pg


p g  xi
 
pi  xi

pi
21/03/2016
16
Constriction PSO
Clerc and Kennedy (2000) suggested a general PSO, where a constriction
coefficient  is applied to both terms of the velocity formula. The
Constriction Type 1’’ PSO is equivalent to the inertia weighted PSO:
constriction
factor


 
 


vi   (vi  1  ( pi  xi )   2  ( p g  xi ))

 
xi  xi  vi
where  
2k
| 2     2  4 |
and   1   2 with
1 c1r1 , 2  c2r2
.
If   4 and k is in [0,1], then the swarm is guaranteed to converge. k controls the
balance between exploration and exploitation.
Typically, k is set to 1, and c1=c2=2.05; and the constriction coefficient
(Clerc and Kennedy 2002).
21/03/2016
 is 0.7298
17
Particle Trajectory
Question: How important are the interactions between particles in a PSO?
To answer this question, we can study a simplified PSO, and look at
scenarios where the swarm is reduced to only one or two particles. This
simplified PSO assumes:
 No stochastic component;
 One dimension;
 Pre-specified initial position and velocity.
v w v  c1 ( pi  x)  c 2 ( p g  x)
x  xv
Acknowledgement: this
example was taken from
Clerc’s recent book “Particle
Swarm Optimization, with
some modifications.
In the following examples, we assume w=0.7, c1=c2=0.7. Note that even with
just one particle, we actually know two positions, x and pi.
2
Consider the Parabola 1D function, f ( x)  x , defined in [-20, 20]. We
have two cases:
1)
The first two positions are on the same side of the minimum (Initial position x= -20, v=3.2)
2)
The first two positions frame the minimum (initial position x=-2, v=6.4).
21/03/2016
18
Particle Trajectory (one particle)
450
9
400
8
7
350
6
5
fitness
fitness
300
250
200
4
3
150
2
100
1
50
0
-4
0
-30
-20
-10
-2
0
2
4
-1
0
x
x
Case 1: The first two positions are on the
same side of the minimum.
Case 2: The first two positions frame the
minimum.
Since personal best is always equal to x, the
particle is unable to reach the minimum
(premature convergence).
The particle oscillates around the minimum;
the personal best is not always equal to x,
resulting in a better convergence behaviour.
21/03/2016
19
Particle Trajectory (one particle)
2.5
8
6
2
4
velocity
velocity
1.5
1
2
0
-4
-2
0
2
4
0.5
-2
-4
0
-20
-15
-10
-5
0
x
x
Case 1: The first two positions are on the
same side of the minimum.
Case 2: The first two positions frame the
minimum.
Phase space graph showing v reaches to 0
too early, resulting premature convergence
Phase space graph showing v in both
positive and negative values (spiral
converging behaviour)
21/03/2016
20
Particle Trajectory (two particles)
m2
2
m1
1
Graph of influence. In this case, we have two explorers and two
memories. Each explorer receives information from the two memories,
but informs only one (Clerc, 2006).
21/03/2016
21
450
9
400
8
350
7
300
6
250
5
fitness
fitness
Particle Trajectory (two particles)
200
4
150
3
100
2
50
1
0
0
-30
-20
-10
0
-50
x
10
-4
-2
0
2
4
-1
x
Now we have two particles (two explorers and two memories). The starting positions for
the two particles are the same as in Case 1 and 2. But now the particles are working
together (Clerc, 2006).
Note, however, here, memory 2 is always better than memory 1, hence the course of
explorer 2 is exactly the same as seen in the previous Case 2 (Figure on the right-hand
side). On the other hand, explorer 1 will benefit from the information provided by memory 2,
i.e., it will end up converging (Figure on the left) .
21/03/2016
22
Particle Trajectory (two particles)
40
25
35
20
30
15
fitness
fitness
25
20
15
10
10
5
5
0
-6
0
-10
-5
0
-4
-2
0
-5
x
2
4
5
-5
x
Two explorers and two memories. This is the more general case where each explorer is
from time to time influenced by the memory of the other, when it is better than its own.
Convergence is more probable, though may be slower.
21/03/2016
23
Particle Trajectory (two particles)
8
6
velocity
4
2
0
-10
-5
0
5
-2
-4
position
Two explorers and two memories. Particle trajectories in the Phase space. The two
particles help each other to enter and remain in the oscillatory process that allows
convergence towards the optimum.
21/03/2016
24
Potential Dangerous Property

What happens when



xi  p i  p g

Then the velocity update depends only on

wv i

If this condition persists for a number of iterations,

wvi  0

Solution: Let the global best particle perform a local search, and
use mutation to break the condition.
21/03/2016
25
Fully Informed Particle Swarm (FIPS)
Previous velocity equation shows that that a particle tends to converge

towards a pointdetermined by p , which is a weighted

 average of its
previous best p i and the neighbourhood’s best p g . p can be further
generalized to any number of terms:
 cmax

r
[
0
,
]

p
k |  | k
p


k
k

N denotes the neighbourhood, and p k the best previous position
found by




the k-th particle in N. If the size of N equals 2, p1  pi and p 2  p g then the
above is a generalization of the canonical PSO.
A significant implication of the above generalization is that it allows

 us to
p
think more freely employing terms of influence other than just p i and g .
21/03/2016
26
Bare Bones PSO
What if we drop the velocity term? Is it necessary?
Kennedy (2003) carried out some experiments using a PSO variant, which
drops the velocity term from the PSO equation.
If pi and pg were kept constant, a
canonical PSO samples the search
space following a bell shaped
distribution centered exactly
between the pi and pg.
pi
pg
This bare bones PSO produces normally distributed random numbers
around the mean ( pid  p gd ) / 2 (for each dimension d), with the standard
deviation of the Gaussian distribution being | pid  p gd | .
21/03/2016
27
Binary PSO

Position update changes to:
1 if U (0,1)  sig (vij (t  1))
xij (t  1)  
0 otherwise
where
1
sig (v) 
1  e v
21/03/2016
28
Some PSO variants

Tribes (Clerc, 2006) – aims to adapt population size, so that it does not have to be set
by the users; Tribes have also been used for discrete, or mixed (discrete/continuous)
problems.

ARPSO (Riget and Vesterstorm, 2002) – uses a diversity measure to alternate
between 2 phases;

Dissipative PSO (Xie, et al., 2002) – increasing randomness;

PSO with self-organized criticality (Lovbjerg and Krink, 2002) – aims to improve
diversity;

Self-organizing Hierachicl PSO (Ratnaweera, et al. 2004);

FDR-PSO (Veeramachaneni, et al., 2003) – using nearest neighbour interactions;

PSO with mutation (Higashi and Iba, 2003; Stacey, et al., 2004)

Cooperative PSO (van den Bergh and Engelbrecht, 2005) – a cooperative approach

DEPSO (Zhang and Xie, 2003) – aims to combine DE with PSO;

CLPSO (Liang, et al., 2006) – incorporate learning from more previous best particles.
21/03/2016
29
Communication topologies (1)
Two most common models:
 gbest: each particle is influenced by the best found from the entire swarm.
 lbest: each particle is influenced only by particles in local neighbourhood.
21/03/2016
30
Communication topologies (2)
5
6
Graph of influence of a
swarm of 7 particles. For
each arc, the particle origin
influence (informs) the end
particle (Clerc, 2006)
4
7
3
1
21/03/2016
2
This graph of influence can
be also expanded to include
previous best positions (i.e.,
memories).
31
Communication topologies (3)
Global
21/03/2016
Island model
Fine-grained
32
Communication topologies (4)
Which one to use?
Balance between exploration and exploitation…
gbest model propagate information the fastest in the population; while the
lbest model using a ring structure the slowest. For complex multimodal
functions, propagating information the fastest might not be desirable.
However, if this is too slow, then it might incur higher computational cost.
Mendes and Kennedy (2002) found that von Neumann topology (north,
south, east and west, of each particle placed on a 2 dimensional lattice)
seems to be an overall winner among many different communication
topologies.
21/03/2016
33
Readings on PSO


Riccardo Poli, James Kennedy, and Tim
Blackwell (2007), "Particle swarm optimization An Overview", Swarm Intelligence, 1: 33–57
Kennedy, J. Eberhart, R.C., and Shi, Y. (2001),
Swarm Intelligence, New York: Morgan
Kaufmann Publishers.
21/03/2016
34
Download