Reinforcement Learning Enhanced Heuristic Search for CombinatorialOptimizationReinforcement Learning Enhanced Heuristic Search for CombinatorialOptimization (Reinforcement Learning gebaseerde heuristieken voorcombinatorische optimalisatie)

Wauters, Tony

Author:

Wauters, Tony

Keywords:

Itec, iMinds

Abstract:

The present thesis describes the use of reinforcement learning to enhance heuristic search for solving complex (real-world) optimization problems such as (project) scheduling, routing and assignment. Heuristic search methods are known to deliver good results in a reasonable amount of calculation time, without any guarantee of optimality. Often they require careful parameter tuning to obtain good results. Reinforcement learning methods on the other hand, learn to act in a possible unknown random environment on a trial-and-error basis. The goal of the hybridizationof heuristic search and reinforcement learning is to generate intelligent search methods which are adaptive and generally applicable while keeping eventual extra overhead to a minimum.Three levels of inclusion of reinforcement learning into heuristic search methods are defined: the direct, the metaheuristic and the hyperheuristic level. At the direct level, the reinforcement learning method searches directly for good quality solutions, while at the metaheuristic and hyperheuristic level, the reinforcement learning component is added for learning good starting solutions, good parameter values, good objective functions, good heuristics, etc. For each level, one or more learning enhanced methods are demonstrated on benchmark and/or real-world problems. A general methodology for learning permutations without any domain knowledge is described. Additionally, a method for learning to select heuristics during search is described and tested on several hard combinatorial optimization problems such as the traveling tournament problem, the patient admission scheduling problem, and the machine reassignment problem. It is shown that this learning selection method performs significantly better than selecting the heuristics at random.From an application point of view, this thesis is mainly, though not exclusively, devoted to scheduling problems. We tackled the multi-mode resource-constrained project scheduling problem, the decentralized resource-constrained multi-project scheduling problem, the (dynamic) flexible job shop scheduling problem and a real-world production scheduling problem from the food industry. Real-world problems often hold a rich structure allowing for learning valuable information. We show that a multi-agent reinforcement learning architecture, morespecifically a network of learning automata with a common reward signal, is very well suited to design new hybrid well performing methods. Manynew best results for project scheduling benchmarks are generated using the proposed GT-MAS approach.