Copying garbage collection for WAM-based prolog systems

Vandeginste, Ruben

Author:

Vandeginste, Ruben

Abstract:

High-level programming languages like Prolog free the programmer from the burden of explicit memory management. In Prolog, dynamic memory allocation is done implicitly by creating data structures. The deallocation of data structures which are no longer in use, is the responsibility of the run-time system. Many Prolog implementations are based on the Warren Abstract Machine (WAM), a virtual machine with an accompanying instruction set, optimised for the execution of Prolog code. The WAM has a built-in mechanism, instant reclaiming, to recover memory on the heap at a constant cost upon backtracking. This mechanism alone, however, is not sufficient for large or mostly deterministic applications. As such, most Prolog implementations use a garbage collector for the heap. Mark-slide garbage collection has always been popular in the context of Prolog because of its intrinsic properties. Still, while the cost of mark-slide garbage collection is linear in the total number of heap cells, the cost of copying garbage collection is linear in the number of live heap cells. The survival rate of heap cells in Prolog applications is typically low and this suggests that copying garbage collection is a better choice for Prolog. On the other hand, copying garbage collection does not preserve the cell order and this leads to some disadvantages. In this dissertation, we develop techniques to make copying garbage collection for WAM-based Prolog systems better in several areas. In the context of Prolog, a standard copying algorithm copies heap cells in specific cell configurations more than once. To prevent this, copying collectors for Prolog typically employ a marking phase which precedes the copying phase. We examine under which circumstances heap cells are copied twice or more and present techniques to reduce and limit the number of extra copies. We investigate whether copying without marking is feasible in the context of Prolog. Instant reclaiming depends on the order of the heap segments. Since copying collectors are not order preserving, instant reclaiming is not possible on collected parts of the heap. We develop a technique to preserve the segment order in a copying collector and make instant reclaiming compatible with copying collection. Generational garbage collection improves the efficiency of simple collectors by collecting only those parts of the heap that likely contain the most garbage. We show that, in the context of Prolog, the cost of generational garbage collection is very low when heap segments are used as the basis for generations. This, however, requires that the segment order is preserved during collection. We develop a multi-generational copying garbage collector, based on our segment order preserving copying algorithm. A drawback of simple copying collectors is that during each collection cycle, the whole heap is collected at once. When the heap is large, this leads to big pause times and this is undesirable for certain types of applications. We develop an incremental garbage collector, which collects only a small piece of the heap during each collection cycle with the purpose of minimising the pause time.