======================================================================
=                    Travelling salesman problem                     =
======================================================================

                            Introduction
======================================================================
The travelling salesman problem  (also called the travelling
salesperson problem or TSP) asks the following question: "Given a list
of cities and the distances between each pair of cities, what is the
shortest possible route that visits each city and returns to the
origin city?" It is an NP-hard problem in combinatorial optimization,
important in operations research and theoretical computer science.

The travelling purchaser problem and the vehicle routing problem are
both generalizations of TSP.

In the theory of computational complexity, the decision version of the
TSP (where, given a length 'L', the task is to decide whether the
graph has any tour shorter than 'L') belongs to the class of
NP-complete problems. Thus, it is possible that the worst-case running
time for any algorithm for the TSP increases superpolynomially (but no
more than exponentially) with the number of cities.

The problem was first formulated in 1930 and is one of the most
intensively studied problems in optimization. It is used as a
benchmark for many optimization methods. Even though the problem is
computationally difficult, a large number of heuristics and exact
algorithms are known, so that some instances with tens of thousands of
cities can be solved completely and even problems with millions of
cities can be approximated within a small fraction of 1%.

The TSP has several applications even in its purest formulation, such
as planning, logistics, and the manufacture of microchips. Slightly
modified, it appears as a sub-problem in many areas, such as DNA
sequencing. In these applications, the concept 'city' represents, for
example, customers, soldering points, or DNA fragments, and the
concept 'distance' represents travelling times or cost, or a
similarity measure between DNA fragments. The TSP also appears in
astronomy, as astronomers observing many sources will want to minimize
the time spent moving the telescope between the sources. In many
applications, additional constraints such as limited resources or time
windows may be imposed.


                              History
======================================================================
The origins of the travelling salesman problem are unclear. A handbook
for travelling salesmen from 1832 mentions the problem and includes
example tours through Germany and Switzerland, but contains no
mathematical treatment.


The travelling salesman problem was mathematically formulated in the
1800s by the Irish mathematician W.R. Hamilton and by the British
mathematician Thomas Kirkman. Hamilton�s Icosian Game was a
recreational puzzle based on finding a Hamiltonian cycle. The general
form of the TSP appears to have been first studied by mathematicians
during the 1930s in Vienna and at Harvard, notably by Karl Menger, who
defines the problem, considers the obvious brute-force algorithm, and
observes the non-optimality of the nearest neighbour heuristic:


It was first considered mathematically in the 1930s by Merrill M.
Flood who was looking to solve a school bus routing problem.
Hassler Whitney at Princeton University introduced the name
'travelling salesman problem' soon afterward.

In the 1950s and 1960s, the problem became increasingly popular in
scientific circles in Europe and the USA after the RAND Corporation in
Santa Monica offered prizes for steps in solving the problem. Notable
contributions were made by George Dantzig, Delbert Ray Fulkerson and
Selmer M. Johnson from the RAND Corporation, who expressed the problem
as an integer linear program and developed the cutting plane method
for its solution. They wrote what is considered the seminal paper on
the subject in which with these new methods they solved an instance
with 49 cities to optimality by constructing a tour and proving that
no other tour could be shorter. Dantzig, Fulkerson and Johnson,
however, speculated that given a near optimal solution we may be able
to find optimality or prove optimality by adding a small number of
extra inequalities (cuts). They used this idea to solve their initial
49 city problem using a string model. They found they only needed 26
cuts to come to a solution for their 49 city problem. While this paper
did not give an algorithmic approach to TSP problems, the ideas that
lay within it were indispensable to later creating exact solution
methods for the TSP, though it would take 15 years to find an
algorithmic approach in creating these cuts. As well as cutting plane
methods, Dantzig, Fulkerson and Johnson used branch and bound
algorithms perhaps for the first time.

In the following decades, the problem was studied by many researchers
from mathematics, computer science, chemistry, physics, and other
sciences. In the 1960s however a new approach was created, that
instead of seeking optimal solutions, one would produce a solution
whose length is provably bounded by a multiple of the optimal length,
and in doing so create lower bounds for the problem; these may then be
used with branch and bound approaches. One method of doing this was to
create a minimum spanning tree of the graph and then double all its
edges, which produces the bound that the length of an optimal tour is
at most twice the weight of a minimum spanning tree.

Christofides made a big advance in this approach of giving an approach
for which we know the worst-case scenario. Christofides algorithm
given in 1976, at worst is 1.5 times longer than the optimal solution.
As the algorithm was so simple and quick, many hoped it would give way
to a near optimal solution method. This remains the method with the
best worst-case scenario. However, for a fairly general special case
of the problem it was beaten by a tiny margin in 2011.

Richard M. Karp showed in 1972 that the Hamiltonian cycle problem was
NP-complete, which implies the NP-hardness of TSP. This supplied a
mathematical explanation for the apparent computational difficulty of
finding optimal tours.

Great progress was made in the late 1970s and 1980, when Grötschel,
Padberg, Rinaldi and others managed to exactly solve instances with up
to 2,392 cities, using cutting planes and branch and bound.

In the 1990s, Applegate,  Bixby, Chvátal, and Cook developed the
program 'Concorde' that has been used in many recent record solutions.
Gerhard Reinelt published the TSPLIB in 1991, a collection of
benchmark instances of varying difficulty, which has been used by many
research groups for comparing results. In 2006, Cook and others
computed an optimal tour through an  85,900-city instance given by a
microchip layout problem, currently the largest solved TSPLIB
instance. For many other instances with millions of cities, solutions
can be found that are guaranteed to be within 2-3% of an optimal tour.


As a graph problem
====================
TSP can be modelled as an undirected weighted graph, such that cities
are the graph's vertices, paths are the graph's edges, and a path's
distance is the edge's weight. It is a minimization problem starting
and finishing at a specified vertex after having visited each other
vertex exactly once. Often, the model is a complete graph ('i.e.' each
pair of vertices is connected by an edge). If no path exists between
two cities, adding an arbitrarily long edge will complete the graph
without affecting the optimal tour.


Asymmetric and symmetric
==========================
In the 'symmetric TSP', the distance between two cities is the same in
each opposite direction, forming an undirected graph. This symmetry
halves the number of possible solutions. In the 'asymmetric TSP',
paths may not exist in both directions or the distances might be
different, forming a directed graph. Traffic collisions, one-way
streets, and airfares for cities with different departure and arrival
fees are examples of how this symmetry could break down.


Related problems
==================
* An equivalent formulation in terms of graph theory is: Given a
complete weighted graph (where the vertices would represent the
cities, the edges would represent the roads, and the weights would be
the cost or distance of that road), find a Hamiltonian cycle with the
least weight.
* The requirement of returning to the starting city does not change
the computational complexity of the problem, see Hamiltonian path
problem.
* Another related problem is the Bottleneck traveling salesman problem
(bottleneck TSP): Find a Hamiltonian cycle in a weighted graph with
the minimal weight of the weightiest edge. For example, avoiding
narrow streets with big buses. The problem is of considerable
practical importance, apart from evident transportation and logistics
areas. A classic example is in printed circuit manufacturing:
scheduling of a route of the drill machine to drill holes in a PCB. In
robotic machining or drilling applications, the "cities" are parts to
machine or holes (of different sizes) to drill, and the "cost of
travel" includes time for retooling the robot (single machine job
sequencing problem).
* The generalized travelling salesman problem, also known as the
"travelling politician problem", deals with "states" that have (one or
more) "cities" and the salesman has to visit exactly one "city" from
each "state". One application is encountered in ordering a solution to
the cutting stock problem in order to minimize knife changes. Another
is concerned with drilling in semiconductor manufacturing, see e.g., .
Noon and Bean demonstrated that the generalized travelling salesman
problem can be transformed into a standard travelling salesman problem
with the same number of cities, but a modified distance matrix.
* The sequential ordering problem deals with the problem of visiting a
set of cities where precedence relations between the cities exist.
* A common interview question at Google is how to route data among
data processing nodes; routes vary by time to transfer the data, but
nodes also differ by their computing power and storage, compounding
the problem of where to send data.
* The travelling purchaser problem deals with a purchaser who is
charged with purchasing a set of products. He can purchase these
products in several cities, but at different prices and not all cities
offer the same products. The objective is to find a route between a
subset of the cities, which minimizes total cost (travel cost +
purchasing cost) and which enables the purchase of all required
products.


              Integer linear programming formulations
======================================================================
The TSP can be formulated as an integer linear program. Several
formulations are known. Two notable formulations are the
Miller-Tucker-Zemlin (MTZ) formulation and the
Dantzig-Fulkerson-Johnson (DFJ) formulation. The DFJ formulation is
stronger, though the MTZ formulation is still useful in certain
settings.


Miller-Tucker-Zemlin formulation
==================================
Label the cities with the numbers 1, �, 'n' and define:

: x_{ij} = \begin{cases} 1 & \text{the path goes from city } i
\text{ to city } j \\ 0 & \text{otherwise} \end{cases}

For 'i' = 1, �, 'n', let u_i be a dummy variable, and finally take
c_{ij} to be the distance from city 'i' to city 'j'. Then TSP can be
written as the following integer linear programming problem:

:\begin{align}
\min &\sum_{i=1}^n \sum_{j\ne i,j=1}^nc_{ij}x_{ij}\colon
&&  \\
& x_{ij} \in \{0,1\}  && i,j=1, \ldots, n; \\
& u_{i} \in \mathbf{Z} && i=2, \ldots, n; \\
& \sum_{i=1,i\ne j}^n x_{ij} = 1 && j=1, \ldots, n;
\\
& \sum_{j=1,j\ne i}^n x_{ij} = 1 && i=1, \ldots, n;
\\
& u_i-u_j +nx_{ij} \le n-1 && 2 \le i \ne j \le n;
\\
& 0 \le u_i \le n-1 && 2 \le i \le n.
\end{align}

The first set of equalities requires that each city is arrived at from
exactly one other city, and the second set of equalities requires that
from each city there is a departure to exactly one other city. The
last constraints enforce that there is only a single tour covering all
cities, and not two or more disjointed tours that only collectively
cover all cities. To prove this, it is shown below (1) that every
feasible solution contains only one closed sequence of cities, and (2)
that for every single tour covering all cities, there are values for
the dummy variables u_i that satisfy the constraints.

To prove that every feasible solution contains only one closed
sequence of cities, it suffices to show that every subtour in a
feasible solution passes through city 1 (noting that the equalities
ensure there can only be one such tour). For if we sum all the
inequalities corresponding to x_{ij}=1 for any subtour of 'k' steps
not passing through city 1, we obtain:

:nk \leq (n-1)k,

which is a contradiction.

It now must be shown that for every single tour covering all cities,
there are values for the dummy variables u_i that satisfy the
constraints.

Without loss of generality, define the tour as originating (and
ending) at city 1. Choose u_{i}=t if city 'i' is visited in step 't'
('i', 't' = 1, 2, ..., n). Then

:u_i-u_j\le n-1,

since u_i can be no greater than 'n' and u_j can be no less than 1;
hence the constraints are satisfied whenever x_{ij}=0. For x_{ij}=1,
we have:

:  u_{i} - u_{j} + nx_{ij} = (t) - (t+1) + n = n-1,

satisfying the constraint.


Dantzig-Fulkerson-Johnson formulation
=======================================
Label the cities with the numbers 1, �, 'n' and define:

: x_{ij} = \begin{cases} 1 & \text{the path goes from city } i
\text{ to city } j \\ 0 & \text{otherwise} \end{cases}

Take c_{ij} to be the distance from city 'i' to city 'j'. Then TSP can
be written as the following integer linear programming problem:

:\begin{align}
\min &\sum_{i=1}^n \sum_{j\ne i,j=1}^nc_{ij}x_{ij}\colon
&&  \\
& 0 \le x_{ij} \le 1  && i,j=1, \ldots, n; \\
& \sum_{i=1,i\ne j}^n x_{ij} = 1 && j=1, \ldots, n;
\\
& \sum_{j=1,j\ne i}^n x_{ij} = 1 && i=1, \ldots, n;
\\
& \sum_{i \in Q}{\sum_{j \in Q}{x_{ij}}} \leq |Q| - 1
&& \forall Q \subsetneq \{1, \ldots, n\}, |Q| \geq 2 \\
\end{align}

The last constraint of the DFJ formulation ensures that there are no
sub-tours among the non-starting vertices, so the solution returned is
a single tour and not the union of smaller tours. Because this leads
to an exponential number of possible constraints, in practice it is
solved with delayed column generation.


                        Computing a solution
======================================================================
The traditional lines of attack for the NP-hard problems are the
following:
* Devising exact algorithms, which work reasonably fast only for small
problem sizes.
* Devising "suboptimal" or heuristic algorithms, i.e., algorithms that
deliver approximated solutions in a reasonable time.
* Finding special cases for the problem ("subproblems") for which
either better or exact heuristics are possible.


Exact algorithms
==================
The most direct solution would be to try all permutations (ordered
combinations) and see which one is cheapest (using brute-force
search). The running time for this approach lies within a polynomial
factor of O(n!), the factorial of the number of cities, so this
solution becomes impractical even for only 20 cities.

One of the earliest applications of dynamic programming is the
Held�Karp algorithm that solves the problem in time O(n^2 2^n). This
bound has also been reached by Exclusion-Inclusion in an attempt
preceding the dynamic programming approach.

Improving these time bounds seems to be difficult. For example, it has
not been determined whether an exact algorithm for TSP that runs in
time O(1.9999^n) exists.

Other approaches include:
* Various branch-and-bound algorithms, which can be used to process
TSPs containing 40-60 cities.


* Progressive improvement algorithms which use techniques reminiscent
of linear programming. Works well for up to 200 cities.
* Implementations of branch-and-bound and problem-specific cut
generation (branch-and-cut); this is the method of choice for solving
large instances. This approach holds the current record, solving an
instance with 85,900 cities, see .

An exact solution for 15,112 German towns from TSPLIB was found in
2001 using the cutting-plane method proposed by George Dantzig, Ray
Fulkerson, and Selmer M. Johnson in 1954, based on linear programming.
The computations were performed on a network of 110 processors located
at Rice University and Princeton University. The total computation
time was equivalent to 22.6 years on a single 500 MHz Alpha processor.
In May 2004, the travelling salesman problem of visiting all 24,978
towns in Sweden was solved: a tour of length approximately 72,500
kilometres was found and it was proven that no shorter tour exists. In
March 2005, the travelling salesman problem of visiting all 33,810
points in a circuit board was solved using 'Concorde TSP Solver': a
tour of length 66,048,945 units was found and it was proven that no
shorter tour exists. The computation took approximately 15.7 CPU-years
(Cook et al. 2006). In April 2006 an instance with 85,900 points was
solved using 'Concorde TSP Solver', taking over 136 CPU-years, see .


Heuristic and approximation algorithms
========================================
Various heuristics and approximation algorithms, which quickly yield
good solutions have been devised. Modern methods can find solutions
for extremely large problems (millions of cities) within a reasonable
time which are with a high probability just 2-3% away from the optimal
solution.

Several categories of heuristics are recognized.


Constructive heuristics
=========================
The nearest neighbour (NN) algorithm (a greedy algorithm) lets the
salesman choose the nearest unvisited city as his next move. This
algorithm quickly yields an effectively short route. For N cities
randomly distributed on a plane, the algorithm on average yields a
path 25% longer than the shortest possible path. However, there exist
many specially arranged city distributions which make the NN algorithm
give the worst route. This is true for both asymmetric and symmetric
TSPs. Rosenkrantz et al. showed that the NN algorithm has the
approximation factor \Theta(\log |V|) for instances satisfying the
triangle inequality. A variation of NN algorithm, called Nearest
Fragment (NF) operator, which connects a group (fragment) of nearest
unvisited cities, can find shorter route with successive iterations.
The NF operator can also be applied on an initial solution obtained by
NN algorithm for further improvement in an elitist model, where only
better solutions are accepted.

The bitonic tour of a set of points is the minimum-perimeter monotone
polygon that has the points as its vertices; it can be computed
efficiently by dynamic programming.

Another constructive heuristic, Match Twice and Stitch (MTS), performs
two sequential matchings, where the second matching is executed after
deleting all the edges of the first matching, to yield a set of
cycles. The cycles are then stitched to produce the final tour.


Christofides algorithm
========================
The Christofides algorithm follows a similar outline but combines the
minimum spanning tree with a solution of another problem,
minimum-weight perfect matching. This gives a TSP tour which is at
most 1.5 times the optimal.  The Christofides algorithm was one of the
first approximation algorithms, and was in part responsible for
drawing attention to approximation algorithms as a practical approach
to intractable problems. As a matter of fact, the term "algorithm" was
not commonly extended to approximation algorithms until later; the
Christofides algorithm was initially referred to as the Christofides
heuristic.

This algorithm looks at things differently by using a result from
graph theory which helps improve on the LB of the TSP which originated
from doubling the cost of the minimum spanning tree. Given an Eulerian
graph we can find an Eulerian tour in  time. So if we had an Eulerian
graph with cities from a TSP as vertices then we can easily see that
we could use such a method for finding an Eulerian tour to find a TSP
solution. By triangular inequality we know that the TSP tour can be no
longer than the Eulerian tour and as such we have a LB for the TSP.
Such a method is described below.
#  Find a minimum spanning tree for the problem
#  Create duplicates for every edge to create an Eulerian graph
#  Find an Eulerian tour for this graph
#  Convert to TSP: if a city is visited twice, create a shortcut from
the city before this in the tour to the one after this.

To improve the lower bound, a better way of creating an Eulerian graph
is needed. By triangular inequality, the best Eulerian graph must have
the same cost as the best travelling salesman tour, hence finding
optimal Eulerian graphs is at least as hard as TSP. One way of doing
this is by minimum weight matching using algorithms of O(n^3).

Making a graph into an Eulerian graph starts with the minimum spanning
tree. Then all the vertices of odd order must be made even. So a
matching for the odd degree vertices must be added which increases the
order of every odd degree vertex by one. This leaves us with a graph
where every vertex is of even order which is thus Eulerian. Adapting
the above method gives Christofides' algorithm,

#  Find a minimum spanning tree for the problem
# Create a matching for the problem with the set of cities of odd
order.
# Find an Eulerian tour for this graph
# Convert to TSP using shortcuts.


Pairwise exchange
===================
The pairwise exchange or '2-opt' technique involves iteratively
removing two edges and replacing these with two different edges that
reconnect the fragments created by edge removal into a new and shorter
tour. Similarly, the 3-opt technique removes 3 edges and reconnects
them to form a shorter tour. These are special cases of the 'k'-opt
method. The label 'Lin-Kernighan' is an often heard misnomer for
2-opt. Lin-Kernighan is actually the more general k-opt method.

For Euclidean instances, 2-opt heuristics give on average solutions
that are about 5% better than Christofides' algorithm. If we start
with an initial solution made with a greedy algorithm, the average
number of moves greatly decreases again and is . For random starts
however, the average number of moves is . However whilst in order this
is a small increase in size, the initial number of moves for small
problems is 10 times as big for a random start compared to one made
from a greedy heuristic. This is because such 2-opt heuristics exploit
`bad' parts of a solution such as crossings. These types of heuristics
are often used within Vehicle routing problem heuristics to reoptimize
route solutions.


''k''-opt heuristic, or Lin�Kernighan heuristics
==================================================
The Lin-Kernighan heuristic is a special case of the 'V'-opt or
variable-opt technique. It involves the following steps:

# Given a tour, delete 'k' mutually disjoint edges.
# Reassemble the remaining fragments into a tour, leaving no disjoint
subtours (that is, don't connect a fragment's endpoints together).
This in effect simplifies the TSP under consideration into a much
simpler problem.
# Each fragment endpoint can be connected to  other possibilities: of
2'k' total fragment endpoints available, the two endpoints of the
fragment under consideration are disallowed. Such a constrained
2'k'-city TSP can then be solved with brute force methods to find the
least-cost recombination of the original fragments.

The most popular of the 'k'-opt methods are 3-opt, as introduced by
Shen Lin of Bell Labs in 1965. A special case of 3-opt is where the
edges are not disjoint (two of the edges are adjacent to one another).
In practice, it is often possible to achieve substantial improvement
over 2-opt without the combinatorial cost of the general 3-opt by
restricting the 3-changes to this special subset where two of the
removed edges are adjacent. This so-called two-and-a-half-opt
typically falls roughly midway between 2-opt and 3-opt, both in terms
of the quality of tours achieved and the time required to achieve
those tours.


''V''-opt heuristic
=====================
The variable-opt method is related to, and a generalization of the
'k'-opt method. Whereas the 'k'-opt methods remove a fixed number
('k') of edges from the original tour, the variable-opt methods do not
fix the size of the edge set to remove. Instead they grow the set as
the search process continues. The best known method in this family is
the Lin-Kernighan method (mentioned above as a misnomer for 2-opt).
Shen Lin and Brian Kernighan first published their method in 1972, and
it was the most reliable heuristic for solving travelling salesman
problems for nearly two decades. More advanced variable-opt methods
were developed at Bell Labs in the late 1980s by David Johnson and his
research team. These methods (sometimes called Lin-Kernighan-Johnson)
build on the Lin-Kernighan method, adding ideas from tabu search and
evolutionary computing. The basic Lin-Kernighan technique gives
results that are guaranteed to be at least 3-opt. The
Lin-Kernighan-Johnson methods compute a Lin-Kernighan tour, and then
perturb the tour by what has been described as a mutation that removes
at least four edges and reconnecting the tour in a different way, then
'V'-opting the new tour. The mutation is often enough to move the tour
from the local minimum identified by Lin-Kernighan. 'V'-opt methods
are widely considered the most powerful heuristics for the problem,
and are able to address special cases, such as the Hamilton Cycle
Problem and other non-metric TSPs that other heuristics fail on. For
many years Lin-Kernighan-Johnson had identified optimal solutions for
all TSPs where an optimal solution was known and had identified the
best known solutions for all other TSPs on which the method had been
tried.


Randomized improvement
========================
Optimized Markov chain algorithms which use local searching heuristic
sub-algorithms can find a route extremely close to the optimal route
for 700 to 800 cities.

TSP is a touchstone for many general heuristics devised for
combinatorial optimization such as genetic algorithms, simulated
annealing, tabu search, ant colony optimization, river formation
dynamics (see swarm intelligence) and the cross entropy method.


Ant colony optimization
=========================
Artificial intelligence researcher Marco Dorigo described in 1993 a
method of heuristically generating "good solutions" to the TSP using a
simulation of an ant colony called 'ACS' ('ant colony system'). It
models behaviour observed in real ants to find short paths between
food sources and their nest, an emergent behaviour resulting from each
ant's preference to follow trail pheromones deposited by other ants.

ACS sends out a large number of virtual ant agents to explore many
possible routes on the map. Each ant probabilistically chooses the
next city to visit based on a heuristic combining the distance to the
city and the amount of virtual pheromone deposited on the edge to the
city. The ants explore, depositing pheromone on each edge that they
cross, until they have all completed a tour. At this point the ant
which completed the shortest tour deposits virtual pheromone along its
complete tour route ('global trail updating'). The amount of pheromone
deposited is inversely proportional to the tour length: the shorter
the tour, the more it deposits.


Metric
========
In the 'metric TSP', also known as 'delta-TSP' or �-TSP, the intercity
distances satisfy the triangle inequality.

A very natural restriction of the TSP is to require that the distances
between cities form a metric to satisfy the triangle inequality; that
is the direct connection from 'A' to 'B' is never farther than the
route via intermediate 'C':
:d_{AB} \le d_{AC} + d_{CB}.

The edge spans then build a metric on the set of vertices. When the
cities are viewed as points in the plane, many natural distance
functions are metrics, and so many natural instances of TSP satisfy
this constraint.

The following are some examples of metric TSPs for various metrics.
*In the Euclidean TSP (see below) the distance between two cities is
the Euclidean distance between the corresponding points.
*In the rectilinear TSP the distance between two cities is the sum of
the absolute values of the differences of their 'x'- and
'y'-coordinates. This metric is often called the Manhattan distance or
city-block metric.
*In the maximum metric, the distance between two points is the maximum
of the absolute values of differences of their 'x'- and
'y'-coordinates.

The last two metrics appear, for example, in routing a machine that
drills a given set of holes in a printed circuit board. The Manhattan
metric corresponds to a machine that adjusts first one co-ordinate,
and then the other, so the time to move to a new point is the sum of
both movements. The maximum metric corresponds to a machine that
adjusts both co-ordinates simultaneously, so the time to move to a new
point is the slower of the two movements.

In its definition, the TSP does not allow cities to be visited twice,
but many applications do not need this constraint. In such cases, a
symmetric, non-metric instance can be reduced to a metric one. This
replaces the original graph with a complete graph in which the
inter-city distance d_{AB} is replaced by the shortest path between
'A' and 'B' in the original graph.


Euclidean
===========
When the input numbers can be arbitrary real numbers, Euclidean TSP is
a particular case of metric TSP, since distances in a plane obey the
triangle inequality. When the input numbers must be integers,
comparing lengths of tours involves comparing sums of square-roots.

Like the general TSP, Euclidean TSP is NP-hard in either case. With
rational coordinates and discretized metric (distances rounded up to
an integer), the problem is NP-complete. With rational coordinates and
the actual Euclidean metric, Euclidean TSP is known to be in the
Counting Hierarchy, a subclass of PSPACE. With arbitrary real
coordinates, Euclidean TSP cannot be in such classes, since there are
uncountably many possible inputs. However, Euclidean TSP is probably
the easiest version for approximation. For example, the minimum
spanning tree of the graph associated with an instance of the
Euclidean TSP is a Euclidean minimum spanning tree, and so can be
computed in expected O ('n' log 'n') time for 'n' points (considerably
less than the number of edges). This enables the simple
2-approximation algorithm for TSP with triangle inequality above to
operate more quickly.

In general, for any 'c' > 0, where 'd' is the number of dimensions
in the Euclidean space, there is a polynomial-time algorithm that
finds a tour of length at most (1 + 1/'c') times the optimal for
geometric instances of TSP in

:O\left(n (\log n)^{(O(c \sqrt{d}))^{d-1}}\right),

time; this is called a polynomial-time approximation scheme (PTAS).
Sanjeev Arora and Joseph S. B. Mitchell were awarded the Gödel Prize
in 2010 for their concurrent discovery of a PTAS for the Euclidean
TSP.

In practice, simpler heuristics with weaker guarantees continue to be
used.


Asymmetric
============
In most cases, the distance between two nodes in the TSP network is
the same in both directions. The case where the distance from 'A' to
'B' is not equal to the distance from 'B' to 'A' is called asymmetric
TSP. A practical application of an asymmetric TSP is route
optimization using street-level routing (which is made asymmetric by
one-way streets, slip-roads, motorways, etc.).


Conversion to symmetric
=========================
Solving an asymmetric TSP graph can be somewhat complex. The following
is a 3�3 matrix containing all possible path weights between the nodes
'A', 'B' and 'C'. One option is to turn an asymmetric matrix of size
'N' into a symmetric matrix of size 2'N'.

:  Asymmetric path weights
!! 'A' !! 'B' !! 'C'    'A'              1       2      'B'     6
3      'C'     5        4

To double the size, each of the nodes in the graph is duplicated,
creating a second 'ghost node', linked to the original node with a
"ghost" edge of very low (possibly negative) weight, here denoted
�'w'. (Alternatively, the ghost edges have weight 0, and weight w is
added to all other edges.)  The original 3�3 matrix shown above is
visible in the bottom left and the transpose of the original in the
top-right. Both copies of the matrix have had their diagonals replaced
by the low-cost hop paths, represented by �'w'. In the new graph, no
edge directly links original nodes and no edge directly links ghost
nodes.

:  Symmetric path weights
!! 'A' !! 'B' !! 'C' !! 'A�' !! 'B�' !! 'C�' 'A'
�'w'    6       5
'B'                              1       �'w'    4      'C'
        2       3       �'w'   'A�'   �'w'     1       2
'B�'   6        �'w'    3                      'C�'   5        4       �'w'

The weight �'w' of the "ghost" edges linking the ghost nodes to the
corresponding original nodes must be low enough to ensure that all
ghost edges must belong to any optimal symmetric TSP solution on the
new graph (w=0 is not always low enough). As a consequence, in the
optimal symmetric tour, each original node appears next to its ghost
node (e.g. a possible path is \mathrm{A \to A' \to C \to C' \to B \to
B' \to A}) and by merging the original and ghost nodes again we get an
(optimal) solution of the original asymmetric problem (in our example,
\mathrm{A \to C \to B \to A}).


Analyst's problem
===================
There is an analogous problem in geometric measure theory which asks
the following: under what conditions may a subset 'E' of Euclidean
space be contained in a rectifiable curve (that is, when is there a
curve with finite length that visits every point in 'E')? This problem
is known as the analyst's travelling salesman problem


Path length for random sets of points in a square
===================================================
Suppose X_1,\ldots,X_n are n independent random variables with uniform
distribution in the square [0,1]^2, and let L^\ast_n be the shortest
path length (i.e. TSP solution) for this set of points, according to
the usual Euclidean distance. It is known that, almost surely,

::\frac{L^*_n}{\sqrt n}\rightarrow \beta\qquad\text{when }n\to\infty,

where \beta is a positive constant that is not known explicitly. Since
L^*_n\le2\sqrt n+2 (see below), it follows from bounded convergence
theorem that \beta=\lim_{n\to\infty} \mathbb E[L^*_n]/\sqrt n, hence
lower and upper bounds on \beta follow from bounds on \mathbb
E[L^*_n].

The almost sure limit \frac{L^*_n}{\sqrt n}\rightarrow \beta as
n\to\infty may not exist
if the independent locations  X_1,\ldots,X_n are replaced with
observations from a stationary ergodic process with uniform marginals.


Upper bound
=============
*One has L^*\le 2\sqrt{n}+2, and therefore \beta\leq 2, by using a
naive path which visits monotonically the points inside each of \sqrt
n slices of width 1/\sqrt{n} in the square.
*Few  proved L^*_n\le\sqrt{2n}+1.75, hence \beta\le\sqrt 2, later
improved by Karloff (1987): \beta\le0.984\sqrt2.
* Some study reported  an upper bound that \beta\le 0.92\dots.
* Some study reported  an upper bound that \beta\le 0.73\dots.


Lower bound
=============
*By observing that \mathbb E[L^*_n] is greater than n times the
distance between X_0 and the closest point X_i\ne X_0, one gets (after
a short computation)

::\mathbb E[L^*_n]\ge\tfrac{1}{2} \sqrt{n}.

*A better lower bound is obtained by observing that \mathbb E[L^*_n]
is greater than \tfrac12n times the sum of the distances between X_0
and the closest and second closest points X_i,X_j\ne X_0, which gives

::\mathbb E[L^*_n]\ge \left( \tfrac{1}{4} + \tfrac{3}{8}
\right)\sqrt{n} = \tfrac{5}{8}\sqrt{n},

*The currently  best lower bound is
::\mathbb E[L^*_n]\ge (\tfrac{5}{8} + \tfrac{19}{5184})\sqrt{n},

*Held and Karp gave a polynomial-time algorithm that provides
numerical lower bounds for L^*_n, and thus for \beta(\simeq
L^*_n/{\sqrt n}) which seem to be good up to more or less 1%. In
particular, David S. Johnson obtained a lower bound by computer
experiment:

::L^*_n\gtrsim 0.7080\sqrt{n}+0.522,

where 0.522 comes from the points near square boundary which have
fewer neighbours,
and Christine L. Valenzuela and Antonia J. Jones  obtained the
following other numerical lower bound:
::L^*_n\gtrsim 0.7078\sqrt{n}+0.551.


                      Computational complexity
======================================================================
The problem has been shown to be NP-hard (more precisely, it is
complete for the complexity class FPNP; see function problem), and the
decision problem version ("given the costs and a number 'x', decide
whether there is a round-trip route cheaper than 'x'") is NP-complete.
The bottleneck traveling salesman problem is also NP-hard. The problem
remains NP-hard even for the case when the cities are in the plane
with Euclidean distances, as well as in a number of other restrictive
cases. Removing the condition of visiting each city "only once" does
not remove the NP-hardness, since it is easily seen that in the planar
case there is an optimal tour that visits each city only once
(otherwise, by the triangle inequality, a shortcut that skips a
repeated visit would not increase the tour length).


Complexity of approximation
=============================
In the general case, finding a shortest travelling salesman tour is
NPO-complete.  If the distance measure is a metric (and thus
symmetric), the problem becomes APX-complete and Christofides�s
algorithm approximates it within 1.5.
The best known inapproximability bound is 123/122 .

If the distances are restricted to 1 and 2 (but still are a metric)
the approximation ratio becomes 8/7. In the asymmetric case with
triangle inequality, only logarithmic performance guarantees are
known, the best current algorithm achieves performance ratio 0.814
log('n'); it is an open question if a constant factor approximation
exists.
The best known inapproximability bound is 75/74 .

The corresponding maximization problem of finding the 'longest'
travelling salesman tour is approximable within 63/38. If the distance
function is symmetric, the longest tour can be approximated within 4/3
by a deterministic algorithm and within \tfrac{1}{25}(33+\varepsilon)
by a randomized algorithm.


                    Human and animal performance
======================================================================
The TSP, in particular the Euclidean variant of the problem, has
attracted the attention of researchers in cognitive psychology. It has
been observed that humans are able to produce near-optimal solutions
quickly, in a close-to-linear fashion, with performance that ranges
from 1% less efficient for graphs with 10-20 nodes, and 11% more
efficient for graphs with 120 nodes. The apparent ease with which
humans accurately generate near-optimal solutions to the problem has
led researchers to hypothesize that humans use one or more heuristics,
with the two most popular theories arguably being the convex-hull
hypothesis and the crossing-avoidance heuristic. However, additional
evidence suggests that human performance is quite varied, and
individual differences as well as graph geometry appear to affect
performance in the task. Nevertheless, results suggest that computer
performance on the TSP may be improved by understanding and emulating
the methods used by humans for these problems, and have also led to
new insights into the mechanisms of human thought. The first issue of
the 'Journal of Problem Solving' was devoted to the topic of human
performance on TSP, and a 2011 review listed dozens of papers on the
subject.

A 2011 study in animal cognition entitled �Let the Pigeon Drive the
Bus,� named after the children's book 'Don't Let the Pigeon Drive the
Bus!', examined spatial cognition in pigeons by studying their flight
patterns between multiple feeders in a laboratory in relation to the
travelling salesman problem. In the first experiment, pigeons were
placed in the corner of a lab room and allowed to fly to nearby
feeders containing peas. The researchers found that pigeons largely
used proximity to determine which feeder they would select next. In
the second experiment, the feeders were arranged in such a way that
flying to the nearest feeder at every opportunity would be largely
inefficient if the pigeons needed to visit every feeder. The results
of the second experiment indicate that pigeons, while still favoring
proximity-based solutions, �can plan several steps ahead along the
route when the differences in travel costs between efficient and less
efficient routes based on proximity become larger.� These results are
consistent with other experiments done with non-primates, which have
proven that some non-primates were able to plan complex travel routes.
This suggests non-primates may possess a relatively sophisticated
spatial cognitive ability.


                        Natural computation
======================================================================
When presented with a spatial configuration of food sources, the
amoeboid Physarum polycephalum adapts its morphology to create an
efficient path between the food sources which can also be viewed as an
approximate solution to TSP. It's considered to present interesting
possibilities and it has been studied in the area of natural
computing.


                             Benchmarks
======================================================================
For benchmarking of TSP algorithms,
[http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ TSPLIB] is a
library of sample instances of the TSP and related problems is
maintained, see the TSPLIB external reference. Many of them are lists
of actual cities and layouts of actual printed circuits.


                          Popular culture
======================================================================
* 'Travelling Salesman', by director Timothy Lanzone, is the story of
four mathematicians hired by the U.S. government to solve the most
elusive problem in computer-science history: P vs. NP.


                              See also
======================================================================
* Canadian traveller problem
* Exact algorithm
* Route inspection problem (also known as "Chinese postman problem")
* Set TSP problem
* Seven Bridges of Königsberg
* Steiner travelling salesman problem
* Subway Challenge
* Tube Challenge
* Vehicle routing problem
* Graph exploration


                             References
======================================================================
*.
*.
*.
*.
*.
*.
*.
*.
*.
*.
*.
*
*.
*.
*.
*.
*.
*.
*.
*.
*.


                           External links
======================================================================
*  at University of Waterloo
* [http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/
TSPLIB] at the University of Heidelberg
* '[http://demonstrations.wolfram.com/TravelingSalesmanProblem/
Traveling Salesman Problem]' by Jon McLoone at the Wolfram
Demonstrations Project
* [https://tspvis.com/ TSP visualization tool]


License
=========
All content on Gopherpedia comes from Wikipedia, and is licensed under CC-BY-SA
License URL: http://creativecommons.org/licenses/by-sa/3.0/
Original Article: http://en.wikipedia.org/wiki/Travelling_salesman_problem