ICAPS 2008 workshop on oversubscribed planning & scheduling
ICAPS edition:18 location:Sydney, Australia date:14-18 september 2008
Many oversubscribed planning problems with multiple (potentially jointly unachievable) goals can be cast in the Markov decision process (MDP) framework when appropriate reward functions and objective criteria can be defined. Recent real-time, search-based solutions to MDPs maintain upper and lower bounds on the value function at all stages of execution prior to convergence. While many algorithms already use this information to detect convergence and to prioritize state exploration, we propose a novel Bayesian interpretation of these bounds to evaluate the myopic value of perfect information (VPI) of exactly knowing a state’s value. This VPI analysis permits two general Bayesian search control modifications to existing algorithms: (1) we can use it to dynamically adapt the backup depth in real-time, search-based bounded dynamic programming approaches, and (2) we can use it for real-time policy evaluation with a bounded value function produced by an anytime algorithm. We empirically evaluate both of these modifications to existing algorithms and analyze their performance. While real-time Bayesian search control may reduce the number of backups until convergence for (1), it offers more substantive improvements for (2), making it an attractive real-time policy evaluation approach for MDPs.
Presentation was based on the material in the uploaded file