======================================================================
=                        Simulated annealing                         =
======================================================================

                            Introduction
======================================================================
Simulated annealing (SA) is a probabilistic technique for
approximating the global optimum of a given function. Specifically, it
is a metaheuristic to approximate global optimization in a large
search space for an optimization problem. It is often used when the
search space is discrete (e.g., the traveling salesman problem). For
problems where finding an approximate global optimum is more important
than finding a precise local optimum in a fixed amount of time,
simulated annealing may be preferable to alternatives such as gradient
descent.

The name and inspiration come from annealing in metallurgy, a
technique involving heating and controlled cooling of a material to
increase the size of its crystals and reduce their defects. Both are
attributes of the material that depend on its thermodynamic free
energy. Heating and cooling the material affects both the temperature
and the thermodynamic free energy.
The simulation of annealing can be used to find an approximation of a
global minimum for a function with many variables.

This notion of slow cooling implemented in the simulated annealing
algorithm is interpreted as a slow decrease in the probability of
accepting worse solutions as the solution space is explored. Accepting
worse solutions is a fundamental property of metaheuristics because it
allows for a more extensive search for the global optimal solution. In
general, the simulated annealing algorithms work as follows. At each
time step, the algorithm randomly selects a solution close to the
current one, measures its quality, and then decides to move to it or
to stay with the current solution based on either one of two
probabilities between which it chooses on the basis of the fact that
the new solution is better or worse than the current one. During the
search, the temperature is progressively decreased from an initial
positive value to zero and affects the two probabilities: at each
step, the probability of moving to a better new solution is either
kept to 1 or is changed towards a positive value; on the other hand,
the probability of moving to a worse new solution is progressively
changed towards zero.

The simulation can be performed either by a solution of kinetic
equations for density functions or by using the stochastic sampling
method. The method is an adaptation of the Metropolis-Hastings
algorithm, a Monte Carlo method to generate sample states of a
thermodynamic system, published by N. Metropolis et al. in 1953.


                              Overview
======================================================================
The state of some physical systems, and the function 'E'('s') to be
minimized, is analogous to the internal energy of the system in that
state. The goal is to bring the system, from an arbitrary 'initial
state', to a state with the minimum possible energy.


The basic iteration
=====================
At each step, the simulated annealing heuristic considers some
neighboring state 's*' of the current state 's', and probabilistically
decides between moving the system to state 's*' or staying in-state
's'.  These probabilities ultimately lead the system to move to states
of lower energy.  Typically this step is repeated until the system
reaches a state that is good enough for the application, or until a
given computation budget has been exhausted.


The neighbours of a state
===========================
Optimization of a solution involves evaluating the neighbours of a
state of the problem, which are new states produced through
conservatively altering a given state. For example, in the travelling
salesman problem each state is typically defined as a permutation of
the cities to be visited, and its neighbours are the set of
permutations produced by reversing the order of any two successive
cities. The well-defined way in which the states are altered to
produce neighbouring states is called a "move", and different moves
give different sets of neighbouring states. These moves usually result
in minimal alterations of the last state, in an attempt to
progressively improve the solution through iteratively improving its
parts (such as the city connections in the traveling salesman
problem).

Simple heuristics like hill climbing, which move by finding better
neighbour after better neighbour and stop when they have reached a
solution which has no neighbours that are better solutions, cannot
guarantee to lead to any of the existing better solutions their
outcome may easily be just a local optimum, while the actual best
solution would be a global optimum that could be different.
Metaheuristics use the neighbours of a solution as a way to explore
the solutions space, and although they prefer better neighbours, they
also accept worse neighbours in order to avoid getting stuck in local
optima; they can find the global optimum if run for a long enough
amount of time.


Acceptance probabilities
==========================
The probability of making the transition from the current state s to a
candidate new state s' is specified by an 'acceptance probability
function' P(e, e', T), that depends on the energies e = E(s) and e' =
E(s') of the two states, and on a global time-varying parameter T
called the 'temperature'.  States with a smaller energy are better
than those with a greater energy.  The probability function P must be
positive even when e' is greater than e.  This feature prevents the
method from becoming stuck at a local minimum that is worse than the
global one.

When T tends to zero, the probability P(e, e', T) must tend to zero if
e' > e and to a positive value otherwise. For sufficiently small
values of T, the system will then increasingly favor moves that go
"downhill" (i.e., to lower energy values), and avoid those that go
"uphill."  With T=0 the procedure reduces to the greedy algorithm,
which makes only the downhill transitions.

In the original description of simulated annealing, the probability
P(e, e', T) was equal to 1 when e' < e�i.e., the procedure always
moved downhill when it found a way to do so, irrespective of the
temperature.  Many descriptions and implementations of simulated
annealing still take this condition as part of the method's
definition.  However, this condition is not essential for the method
to work.

The P function is usually chosen so that the probability of accepting
a move decreases when the difference
e'-e increases�that is, small uphill moves are more likely than large
ones.  However, this requirement is not strictly necessary, provided
that the above requirements are met.

Given these properties, the temperature T plays a crucial role in
controlling the evolution of the state s of the system with regard to
its sensitivity to the variations of system energies. To be precise,
for a large T, the evolution of s is sensitive to coarser energy
variations, while it is sensitive to finer energy variations when T is
small.

The specification of , , and  is partially redundant.  In practice,
it's common to use the same acceptance function  for many problems,
and adjust the other two functions according to the specific problem.

In the formulation of the method by Kirkpatrick et al., the acceptance
probability function P(e, e', T) was defined as 1 if e' < e, and
\exp(-(e'-e)/T) otherwise. This formula was superficially justified by
analogy with the transitions of a physical system; it corresponds to
the Metropolis-Hastings algorithm, in the case where T=1 and the
proposal distribution of Metropolis-Hastings is symmetric. However,
this acceptance probability is often used for simulated annealing even
when the  function, which is analogous to the proposal distribution in
Metropolis-Hastings, is not symmetric, or not probabilistic at all. As
a result, the transition probabilities of the simulated annealing
algorithm do not correspond to the transitions of the analogous
physical system, and the long-term distribution of states at a
constant temperature T need not bear any resemblance to the
thermodynamic equilibrium distribution over states of that physical
system, at any temperature. Nevertheless, most descriptions of
simulated annealing assume the original acceptance function, which is
probably hard-coded in many implementations of SA.

In 1990, Moscato and Fontanari , and independently Dueck and Scheuer ,
proposed that a deterministic update (i.e. one that is not based on
the probabilistic acceptance rule) could speed-up the optimization
process without impacting on the final quality. Moscato and Fontanari
conclude from observing the analogous of the "specific heat" curve of
the "threshold updating" annealing originating from their study that
"the stochasticity of the Metropolis updating in the simulated
annealing algorithm does not play a major role in the search of
near-optimal minima". Instead, they proposed that "the smoothening of
the cost function landscape at high temperature and the gradual
definition of the minima during the cooling process are the
fundamental ingredients for the success of simulated annealing." The
method subsequently popularized under the denomination of "threshold
accepting" due to Dueck and Scheuer's denomination. In 2001, Franz,
Hoffmann and Salamon showed that the deterministic update strategy is
indeed the optimal one within the large class of algorithms that
simulate a random walk on the cost/energy landscape .


The annealing schedule
========================
The name and inspiration of the algorithm demand an interesting
feature related to the temperature variation to be embedded in the
operational characteristics of the algorithm. This necessitates a
gradual reduction of the temperature as the simulation proceeds. The
algorithm starts initially with T set to a high value (or infinity),
and then it is decreased at each step following some 'annealing
schedule'�which may be specified by the user, but must end with T=0
towards the end of the allotted time budget.  In this way, the system
is expected to wander initially towards a broad region of the search
space containing good solutions, ignoring small features of the energy
function; then drift towards low-energy regions that become narrower
and narrower; and finally move downhill according to the steepest
descent heuristic.

For any given finite problem, the probability that the simulated
annealing algorithm terminates with a global optimal solution
approaches 1 as the annealing schedule is extended. This theoretical
result, however, is not particularly helpful, since the time required
to ensure a significant probability of success will usually exceed the
time required for a complete search of the solution space.


                             Pseudocode
======================================================================
The following pseudocode presents the simulated annealing heuristic as
described above. It starts from a state  and continues until a maximum
of  steps have been taken. In the process, the call  should generate a
randomly chosen neighbour of a given state ; the call  should pick and
return a value in the range , uniformly at random. The annealing
schedule is defined by the call , which should yield the temperature
to use, given the fraction  of the time budget that has been expended
so far.



* Let
* For  through  (exclusive):
**
** Pick a random neighbour,
** If :
***
* Output: the final state


                      Selecting the parameters
======================================================================
In order to apply the simulated annealing method to a specific
problem, one must specify the following parameters: the state space,
the energy (goal) function , the candidate generator procedure , the
acceptance probability function , and the annealing schedule  AND
initial temperature . These choices can have a significant impact on
the method's effectiveness.  Unfortunately, there are no choices of
these parameters that will be good for all problems, and there is no
general way to find the best choices for a given problem.  The
following sections give some general guidelines.


Sufficiently near neighbour
=============================
Simulated annealing may be modeled as a random walk on a search graph,
whose vertices are all possible states, and whose edges are the
candidate moves. An essential requirement for the  function is that it
must provide a sufficiently short path on this graph from the initial
state to any state which may be the global optimum the diameter of the
search graph must be small. In the traveling salesman example above,
for instance, the search space for n = 20 cities has n! =
2,432,902,008,176,640,000 (2.4 quintillion) states; yet the neighbor
generator function that swaps two consecutive cities can get from any
state (tour) to any other state in at most \sum_{k=1}^{n-1}
k=\frac{n(n-1)}{2}=190 steps.


Transition probabilities
==========================
To investigate the behavior of simulated annealing on a particular
problem, it can be useful to consider the 'transition probabilities'
that result from the various design choices made in the implementation
of the algorithm.  For each edge (s,s') of the search graph, the
transition probability is defined as the probability that the
simulated annealing algorithm will move to state  s' when its current
state is s.  This probability depends on the current temperature as
specified by , on the order in which the candidate moves are generated
by the  function, and on the acceptance probability function . (Note
that the transition probability is not simply P(e, e', T), because the
candidates are tested serially.)


Efficient candidate generation
================================
When choosing the candidate generator , one must consider that after a
few iterations of the simulated annealing algorithm, the current state
is expected to have much lower energy than a random state. Therefore,
as a general rule, one should skew the generator towards candidate
moves where the energy of the destination state s' is likely to be
similar to that of the current state. This heuristic (which is the
main principle of the Metropolis-Hastings algorithm) tends to exclude
"very good" candidate moves as well as "very bad" ones; however, the
former are usually much less common than the latter, so the heuristic
is generally quite effective.

In the traveling salesman problem above, for example, swapping two
'consecutive' cities in a low-energy tour is expected to have a modest
effect on its energy (length); whereas swapping two 'arbitrary' cities
is far more likely to increase its length than to decrease it. Thus,
the consecutive-swap neighbour generator is expected to perform better
than the arbitrary-swap one, even though the latter could provide a
somewhat shorter path to the optimum (with n-1 swaps, instead of
n(n-1)/2).

A more precise statement of the heuristic is that one should try first
candidate states s' for which P(E(s), E(s'), T) is large. For the
"standard" acceptance function P above, it means that E(s') - E(s) is
on the order of T or less. Thus, in the traveling salesman example
above, one could use a  function that swaps two random cities, where
the probability of choosing a city-pair vanishes as their distance
increases beyond T.


Barrier avoidance
===================
When choosing the candidate generator  one must also try to reduce the
number of "deep" local minima�states (or sets of connected states)
that have much lower energy than all its neighbouring states. Such
"closed catchment basins" of the energy function may trap the
simulated annealing algorithm with high probability (roughly
proportional to the number of states in the basin) and for a very long
time (roughly exponential on the energy difference between the
surrounding states and the bottom of the basin).

As a rule, it is impossible to design a candidate generator that will
satisfy this goal and also prioritize candidates with similar energy.
On the other hand, one can often vastly improve the efficiency of
simulated annealing by relatively simple changes to the generator. In
the traveling salesman problem, for instance, it is not hard to
exhibit two tours A, B, with nearly equal lengths, such that (1) A is
optimal, (2) every sequence of city-pair swaps that converts A to B
goes through tours that are much longer than both, and (3) A can be
transformed into B by flipping (reversing the order of) a set of
consecutive cities. In this example, A and B lie in different "deep
basins" if the generator performs only random pair-swaps; but they
will be in the same basin if the generator performs random
segment-flips.


Cooling schedule
==================
The physical analogy that is used to justify simulated annealing
assumes that the cooling rate is low enough for the probability
distribution of the current state to be near thermodynamic equilibrium
at all times.  Unfortunately, the 'relaxation time'�the time one must
wait for the equilibrium to be restored after a change in
temperature�strongly depends on the "topography" of the energy
function and on the current temperature.  In the simulated annealing
algorithm, the relaxation time also depends on the candidate
generator, in a very complicated way.  Note that all these parameters
are usually provided as black box functions to the simulated annealing
algorithm. Therefore, the ideal cooling rate cannot be determined
beforehand, and should be empirically adjusted for each problem.
Adaptive simulated annealing algorithms address this problem by
connecting the cooling schedule to the search progress. Other adaptive
approach as Thermodynamic Simulated Annealing, automatically adjusts
the temperature at each step based on the energy difference between
the two states, according to the laws of thermodynamics.


                              Restarts
======================================================================
Sometimes it is better to move back to a solution that was
significantly better rather than always moving from the current state.
This process is called 'restarting' of simulated annealing.  To do
this we set s and e to sbest and ebest and perhaps restart the
annealing schedule.  The decision to restart could be based on several
criteria. Notable among these include restarting based on a fixed
number of steps, based on whether the current energy is too high
compared to the best energy obtained so far, restarting randomly, etc.


                          Related methods
======================================================================
* Interacting Metropolis-Hasting algorithms (a.k.a. Sequential Monte
Carlo) combined simulated annealing moves with an acceptance-rejection
of the best fitted individuals equipped with an interacting recycling
mechanism.
* Quantum annealing uses "quantum fluctuations" instead of thermal
fluctuations to get through high but thin barriers in the target
function.
* Stochastic tunneling attempts to overcome the increasing difficulty
simulated annealing runs have in escaping from local minima as the
temperature decreases, by 'tunneling' through barriers.
* Tabu search normally moves to neighbouring states of lower energy,
but will take uphill moves when it finds itself stuck in a local
minimum; and avoids cycles by keeping a "taboo list" of solutions
already seen.
* Dual-phase evolution is a family of algorithms and processes (to
which simulated annealing belongs) that mediate between local and
global search by exploiting phase changes in the search space.
* Reactive search optimization focuses on combining machine learning
with optimization, by adding an internal feedback loop to self-tune
the free parameters of an algorithm to the characteristics of the
problem, of the instance, and of the local situation around the
current solution.
* Stochastic gradient descent runs many greedy searches from random
initial locations.
* Genetic algorithms maintain a pool of solutions rather than just
one. New candidate solutions are generated not only by "mutation" (as
in SA), but also by "recombination" of two solutions from the pool.
Probabilistic criteria, similar to those used in SA, are used to
select the candidates for mutation or combination, and for discarding
excess solutions from the pool.
* Memetic algorithms search for solutions by employing a set of agents
that both cooperate and compete in the process; sometimes the agents'
strategies involve simulated annealing procedures for obtaining high
quality solutions before recombining them. Annealing has also been
suggested as a mechanism for increasing the diversity of the search .
* Graduated optimization digressively "smooths" the target function
while optimizing.
* Ant colony optimization (ACO) uses many ants (or agents) to traverse
the solution space and find locally productive areas.
* The cross-entropy method (CE) generates candidates solutions via a
parameterized probability distribution. The parameters are updated via
cross-entropy minimization, so as to generate better samples in the
next iteration.
* Harmony search mimics musicians in improvisation process where each
musician plays a note for finding a best harmony all together.
* Stochastic optimization is an umbrella set of methods that includes
simulated annealing and numerous other approaches.
* Particle swarm optimization is an algorithm modelled on swarm
intelligence that finds a solution to an optimization problem in a
search space, or model and predict social behavior in the presence of
objectives.
* The runner-root algorithm (RRA) is a meta-heuristic optimization
algorithm for solving unimodal and multimodal problems inspired by the
runners and roots of plants in nature.
* Intelligent water drops algorithm (IWD) which mimics the behavior of
natural water drops to solve optimization problems
* Parallel tempering is a simulation of model copies at different
temperatures (or Hamiltonians) to overcome the potential barriers.


                          Further reading
======================================================================
*A. Das and B. K. Chakrabarti (Eds.),
'[ftp://nozdr.ru/biblio/kolxoz/M/MP/Das%20A.,%20Chakrabarti%20B.K.%20(eds.)%20Qu
antum%20Annealing%20and%20Related%20Optimization%20Methods%20(LNP0679,%20Springe
r,%202005)(384s)_MP_.pdf
Quantum Annealing and Related Optimization Methods],' Lecture Note in
Physics, Vol. 679, Springer, Heidelberg (2005)
*
*
*
*V.Vassilev, A.Prahova: "The Use of Simulated Annealing in the Control
of Flexible Manufacturing Systems", International Journal INFORMATION
THEORIES & APPLICATIONS,
[http://www.foibg.com/ijita/vol01-09/ijita-fv06.htm VOLUME 6/1999]


                           External links
======================================================================
* [http://www.heatonresearch.com/aifh/vol1/tsp_anneal.html Simulated
Annealing] A Javascript app that allows you to experiment with
simulated annealing. Source code included.
*
[http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=10548&
amp;objectType=file
"General Simulated Annealing Algorithm"] An open-source MATLAB program
for general simulated annealing exercises.
* Self-Guided Lesson on Simulated Annealing A Wikiversity project.
*
[https://arstechnica.com/science/news/2009/12/uncertainty-hovers-over-claim-goog
les-using-quantum-computer.ars
Google in superposition of using, not using quantum computer] Ars
Technica discusses the possibility that the D-Wave computer being used
by Google may, in fact, be an efficient simulated annealing
co-processor.


License
=========
All content on Gopherpedia comes from Wikipedia, and is licensed under CC-BY-SA
License URL: http://creativecommons.org/licenses/by-sa/3.0/
Original Article: http://en.wikipedia.org/wiki/Simulated_annealing