Complex aggregates in relational learning

Vens, Celine; Blockeel, Hendrik

Author:

Vens, Celine

Blockeel, Hendrik

Keywords:

relational learning, aggregation

Abstract:

In relational learning one learns patterns from relational databases, which usually contain multiple tables that are interconnected via relations. These relations may be of one-to-many or many-to-many cardinality ratios. Thus, an example for which a prediction is to be given may be related to a set of objects that are possibly relevant for that prediction. Relational classifiers differ with respect to how they handle these sets: some use properties of the set as a whole (using aggregation), some refer to properties of specific individuals of the set, however, most classifiers do not combine both. This imposes an undesirable bias on these learners. This dissertation describes a learning approach that avoids this bias, by using complex aggregates, i.e., aggregates that impose selection conditions on the set to aggregate on. This combination of aggregates and selections presents several difficulties. First, the search space is substantially increased, and second, the generality order of the hypotheses that is assumed by many relational learners is violated. This implies that one either has to give up on efficiency or on completeness when searching the hypothesis space using classical refinement operators. We develop a general refinement framework that considers the complete search space, and traverses it in a general-to-specific, hence efficient, way. Complex aggregates are included in an existing relational learner that constructs relational decision trees. We argue that in this context, the generality ordering can not be violated, and classical refinement operators can be applied. To improve efficiency, we present two techniques: an application of the developed refinement framework and an upgrade of the relational decision tree algorithm to a relational random forest inducer. The use of complex aggregates is also studied in the consequent of a hypothesis. More precisely, we investigate the use of complex aggregates in the linear models built by a relational model tree learner. This involves upgrading the relational decision tree algorithm to a relational model tree learning system. The main contribution in this work is the development of an efficient heuristic function suitable for learning model trees.