Lecture notes in computer science vol:4701 pages:359-370
European Conference on Machine Learning edition:18 location:Warsaw, Poland date:September 17-21, 2007
Constrained clustering investigates how to incorporate domain knowledge in the clustering process. The domain knowledge takes the form of constraints that must hold on the set of clusters. We consider instance level constraints, such as must-link and cannot-link. This type of constraints has been successfully used in popular clustering algorithms, such as k-means and hierarchical agglomerative clustering. This paper shows how clustering trees can support instance level constraints. Clustering trees are decision trees that partition the instances into homogeneous clusters. Clustering trees provide a symbolic description for each cluster. To handle non-trivial constraint sets, we extend clustering trees to support disjunctive descriptions. The paper's main contribution is ClusILC, an efficient algorithm for building such trees. We present experiments comparing ClusILC to COP-k-means.