Pareto Frontier Approximation Network (PA-Net) to Solve Bi-objective TSP
Ishaan Mehta, Sharareh Taghipour and Sajad Saeedi
The travelling salesperson problem (TSP) is a classic resource allocation problem used to find an optimal order of doing a set of tasks while minimizing (or maximizing) an associated objective function. It is widely used in robotics for applications such as planning and scheduling. In this work, we solve TSP for two objectives using reinforcement learning (RL). Often in multi-objective optimization problems, the associated objective functions can be conflicting in nature. In such cases, the optimality is defined in terms of Pareto optimality. A set of these Pareto optimal solutions in the objective space form a Pareto front (or frontier). Each solution has its trade off. We present Pareto frontier approximation network (PA-Net), a network that generates good approximations of the Pareto front for the bi-objective travelling salesperson problem (BTSP). Firstly, BTSP is converted into a constrained optimization problem. We then train our network to solve this constrained problem using the Lagrangian relaxation and policy gradient. With PA-Net we improve the performance over an existing deep RL-based method. The average improvement in hypervolume metric which is used to measure optimality of the Pareto front is 2.3%. At the same time, PA-Net has 4.5x faster inference time Finally, we present the application of PA-Net to find optimal visiting order in a robotic navigation task/coverage planning.
Path generated by our network for 30 city BTSP
Demonstration of BTSP for robotics
The images below present the solutions of 100 city BTSP generated by our network for a real-world robotic map. The first objective is the total tour length and the second objective represents the violation of pre-assigned priority to various paths in the map. The first image presents the Pareto front generated by our network. The tours corresponding to the points highlighted on the Pareto front are shown in subsequent images. Note that there is a map corresponding to each objective. The city number in the maps is also highlighted. Tour for Point A has the shortest path length but a high priority violation as shown in plot for objective 2. Whereas, tour for Point C has the highest path length but a low priority violation. The tour for Point B shows an intermediate solution between the two extreme cases given by Point A and B.
Convergence to Concave Pareto Fronts
Unlike linear scalarization methods, our framework is capable of finding solutions for Multi objective optimization (MOO) problem with concave pareto fronts. Our method can find solutions for MOO with concave Pareto front for different preferences in objective space. The plot above is for the following problem