Department of Computer Science, Katholieke Universiteit Leuven
CW Reports vol:CW418 pages:13
Probability trees (or Probability Estimation Trees, PET's) are decision trees with probability distributions in the leaves. Several approaches for learning probability trees have been proposed in the literature. Currently no thorough comparison of these alternative approaches exists.
In this paper we experimentally compare the main approaches using the relational decision tree learner Tilde (both on non-relational and on relational datasets). Next to the main existing approaches, we also consider a novel variant of an existing approach based on the Bayesian Information Criterion (BIC). Our main conclusion is that trees built using the C4.5-approach or the C4.4-approach (C4.5 without post-pruning) typically have the best predictive performance. If the number of classes is low, however, BIC is equally good. An additional advantage of BIC is that its trees are considerably smaller than trees for the C4.5- or C4.4-approach.