Download PDF

ICAPS, Date: 2008/09/14 - 2008/09/18, Location: Sydney, Australia

Publication date: 2008-01-01

ICAPS 2008 workshop on oversubscribed planning & scheduling

Author:

Sanner, Scott
Goetschalckx, Robby ; Driessens, Kurt

Keywords:

dynamic programming, bayesian

Abstract:

Many oversubscribed planning problems with multiple (potentially jointly unachievable) goals can be cast in the Markov decision process (MDP) framework when appropriate reward functions and objective criteria can be defined. Recent real-time, search-based solutions to MDPs maintain upper and lower bounds on the value function at all stages of execution prior to convergence. While many algorithms already use this information to detect convergence and to prioritize state exploration, we propose a novel Bayesian interpretation of these bounds to evaluate the myopic value of perfect information (VPI) of exactly knowing a state’s value. This VPI analysis permits two general Bayesian search control modifications to existing algorithms: (1) we can use it to dynamically adapt the backup depth in real-time, search-based bounded dynamic programming approaches, and (2) we can use it for real-time policy evaluation with a bounded value function produced by an anytime algorithm. We empirically evaluate both of these modifications to existing algorithms and analyze their performance. While real-time Bayesian search control may reduce the number of backups until convergence for (1), it offers more substantive improvements for (2), making it an attractive real-time policy evaluation approach for MDPs.