On Constraints, Optimisation, Probability and Data Mining

Babaki, Behrouz; De Raedt, Luc; Guns, Tias

Author:

Babaki, Behrouz

De Raedt, Luc ; Guns, Tias

Abstract:

Constraint satisfaction and optimization (CSP(O)), probabilistic inference, and data mining are three important subdomains of artificial intelligence. CSP(O) investigates methods for efficiently solving combinatorial problems, probabilistic inference deals with answering queries about uncertain knowledge bases, while data mining aims at finding and modeling regularities in the data. Even though these domains have been developed independently, there are strong connections and interactions between them. Studying and extending their interactions is the main theme of this thesis. This thesis has five contributions. First, we build on existing methods in probabilistic inference and constraint programming and propose a method for adding probabilities to the CSP(O) models. This allows us to constrain or optimize over probability values. Second, we develop a novel algorithm for optimizing the expected utility in stochastic constraint programs. Unlike earlier works that assume independence of random variables, we assume a joint distribution represented by a Bayesian network. Third, we develop an algorithm for obtaining the exact solution of the constrained clustering problem with the maximum sum of squares objective. By using the column-generation framework, we decompose the problem into two components: one that generates candidate clusters that adhere to user constraints, and one that tries to find the optimal solution among combinations of the generated candidates. Fourth, we develop an exact algorithm to solve a special class of graph clustering problems. The formulation of this problem involves an exponential number of constraints. We propose a mechanism to incrementally include only a subset of these constraints in the problem. Fifth, we develop a mechanism for learning the distribution of taxi requests from large datasets of taxi trip records. We show that combining techniques from multiple domains can yield improvements in addressing a number of existing and new problems.