ITEM METADATA RECORD
Title: Fast, effective molecular feature mining by local optimization
Authors: Zimmermann, Albrecht
Bringmann, Björn
Rückert, Ulrich #
Issue Date: Sep-2010
Publisher: Springer
Host Document: Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2010, Barcelona, Spain, September 20-24, 2010, Proceedings, Part III pages:563-578
Series Title: Lecture Notes in Computer Science
Conference: PKDD edition:14 location:Barcelona date:20-24 September, 2010
Abstract: Instructure-activity-relationships(SAR)oneaimsatfinding classifiers that predict the biological or chemical activity of a compound from its molecular graph. Many approaches to SAR use sets of binary substructure features, which test for the occurrence of certain substruc- tures in the molecular graph. As an alternative to enumerating very large sets of frequent patterns, numerous pattern set mining and pattern set selection techniques have been proposed. Existing approaches can be broadly classified into those that focus on minimizing correspondences, that is, the number of pairs of training instances from different classes with identical encodings and those that focus on maximizing the num- ber of equivalence classes, that is, unique encodings in the training data. In this paper we evaluate a number of techniques to investigate which criterion is a better indicator of predictive accuracy. We find that min- imizing correspondences is a necessary but not sufficient condition for good predictive accuracy, that equivalence classes are a better indica- tor of success and that it is important to have a good match between training set and pattern set size. Based on these results we propose a new, improved algorithm which performs local minimization of corre- spondences, yet evaluates the effect of patterns on equivalence classes globally. Empirical experiments demonstrate its efficacy and its superior run time behavior.
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section
# (joint) last author

Files in This Item:
File Description Status SizeFormat
main.pdf Published 189KbAdobe PDFView/Open

 


All items in Lirias are protected by copyright, with all rights reserved.