Lecture Notes in Computer Science vol:6804 pages:346-357
International Symposium on Methodologies for Intelligent systems edition:19 location:Warsaw, Poland date:28-30 June 2011
The goal of clustering is to form groups of similar elements. Quality criteria for clusterings, as well as the notion of similarity, depend strongly on the application domain, which explains the existence of many different clustering algorithms and similarity measures. In this paper we focus on the problem of clustering annotated nodes in a graph, when the similarity between nodes depends on both their annotations and their context in the graph ("hybrid" similarity), using k-means-like clustering algorithms. We show that, for the similarity measure we focus on, k-means itself cannot trivially be applied. We propose three alternatives, and evaluate them empirically on the Cora dataset. We ﬁnd that using these alternative clustering algorithms with the hybrid similarity can be advantageous over using standard k-means with a purely annotation-based similarity.