Download PDF

Benelearn, Date: 2015/06/19 - 2015/06/19, Location: Delft

Publication date: 2015-06-01

Benelearn 2015 Poster presentations (online)

Author:

Van Craenendonck, Toon
Blockeel, Hendrik

Keywords:

clustering, internal validity meaures

Abstract:

An obvious question to ask yourself when you want to cluster a data set is: which algorithm should I use? Given the variety of existing clustering algorithms, answering this question is far from trivial. A straightforward strategy is to simply run several algorithms, with a number of different parameter configurations, and afterwards select the best clustering from the generated set of solutions. But how do we select the best one? One way to do this is by using internal validity measures, which map a clustering to a number indicating its quality. For this strategy to be valid, we need an internal measure that allows for a fair comparison between clustering algorithms. We have experimented with four of these validity measures and six clustering algorithms. We observed some undesired properties for each of the measures, making them unsuitable for such a comparison.