A Probabilistic Prolog and its Applications (Een probabilistische prolog en zijn toepassingen)

Kimmig, Angelika

Author:

Kimmig, Angelika

Keywords:

probabilistic logic learning, probabilistic programming

Abstract:

One of the key challenges in artificial intelligence is the integration of machine learning, relational knowledge representation languages and reasoning under uncertainty. Given its multi-disciplinary nature, this topic has been approached from different angles, leading to a very active research area known as probabilistic logic learning or statistical relational learning.Motivated by the need for a probabilistic logic language with an implementation that supports reasoning in large networks of uncertain links, as they arise for instance when integrating information from various databases, this thesis introduces ProbLog, a simple extension of the logic programming language Prolog with independent random variables in the form of probabilistic facts. While ProbLog shares its distribution semantics with other approaches developed in the field of probabilistic logic learning, its implementation has been the first to allow for scalable inference in networks of probabilistic links. This is due to the use of advanced data structures that make it possible to base probabilistic inference on proofs or explanations even if those are not mutually exclusive in terms of the possible worlds they cover. As a general purpose probabilistic logic programming language, ProbLog provides a framework for lifting relational learning approaches to the probabilistic context. We apply this methodology to obtain three new probabilistic relational learning techniques. The first one, theory compression, is a form of theory revision that reduces a ProbLog program to a given maximum size by deleting probabilistic facts, using example queries that should or should not be provable with high probability as a guideline. The second example, probabilistic explanation based learning, naturally extends explanation based learning to a probabilistic setting by choosing the most likely proof of an example for generalization. The third approach, probabilistic query mining, combines multi-relational data mining with scoring functions based on probabilistic databases. The latter two techniques are also used to illustrate the application of ProbLog as a framework for reasoning by analogy, where the goal is to identify examples with a high probability of being covered by logical queries that also assign high probability to given training examples. While these approaches all rely on ProbLog programs with known probability labels, we also introduce a parameter estimation technique for ProbLog that determines those labels based on examples in the form of queries or proofs augmented with desired probabilities. Throughout the thesis, we experimentally evaluate the methods we develop in the context of a large network of uncertain information gathered from biological databases.While the main focus of this thesis lies on ProbLog and its applications, we also contribute a second framework called wProbLog, which generalizes ProbLog to weight labels from an arbitrary commutative semiring. We introduce inference algorithms that calculate weights of derived queries, showing that the ideas underlying the ProbLog system carry over to more general types of weights.