Title: Improving the accuracy of similarity measures by using link information
Authors: Witsenburg, Tijn
Blockeel, Hendrik
Issue Date: 2011
Publisher: Springer
Host Document: Lecture Notes in Computer Science vol:6804 pages:501-512
Conference: International Symposium on Methodologies for Intelligent systems edition:19 location:Warsaw, Poland date:28-30 June 2011
Abstract: The notion of similarity is crucial to a number of tasks and methods in machine learning and data mining, including clustering and nearest neighbor classification. In many contexts, there is on the one hand a natural (but not necessarily optimal) similarity measure defined on the objects to be clustered or classified, but there is also information about which objects are linked together. This raises the question to what extent the information contained in the links can be used to obtain a more relevant similarity measure. Earlier research has already shown empirically that more accurate results can be obtained by including such link information, but it was not
analyzed why this is the case. In this paper we provide such an analysis. We relate the extent to which improved results can be obtained to the notions of homophily in the network, transitivity of similarity, and content variability of objects. We explore this relationship using some randomly generated datasets, in which we vary the amount of homophily and content variability. The results show that within a fairly wide range of values for these parameters, the inclusion of link information in the similarity measure indeed yields improved results, as compared to computing the similarity of objects directly
from their content.
ISSN: 0302-9743
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section

Files in This Item:
File Description Status SizeFormat
paper_90.pdfmain article Published 182KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science