Download PDF

Machine learning

Publication date: 2008-01-01
Volume: 73 Pages: 185 - 214
Publisher: Springer New York LLC

Author:

Vens, Celine
Struyf, Jan ; Schietgat, Leander ; Dzeroski, Saso ; Blockeel, Hendrik

Keywords:

hierarchical multi-label classification, functional genomics, decision trees, Science & Technology, Technology, Computer Science, Artificial Intelligence, Computer Science, Hierarchical classification, Multi-label classification, Decision trees, Functional genomics, Precision-recall analysis, EXPRESSION, DATABASE, 0801 Artificial Intelligence and Image Processing, 0806 Information Systems, 1702 Cognitive Sciences, Artificial Intelligence & Image Processing, 4611 Machine learning

Abstract:

Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS's FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction time. We conclude that HMC trees should definitely be considered in HMC tasks where interpretable models are desired.