Spring Workshop on Mining and Learning edition:1 location:Traben-Trarbach, Germany date:23-25 April 2008
We introduce a new machine learning technique for gene function prediction, and investigate its performance on S. cerevisiae and A. thaliana. Two characteristics of this task distinguish it from common machine learning problems: a single gene may have multiple functions, and the functions are organized in a hierarchy: a gene that is related to some function is automatically related to all its "superfunctions" (this is called the hierarchy constraint). This particular problem setting is known in machine learning as hierarchical multi-label classification (HMC). We present an HMC decision tree learner which makes predictions for all classes together, takes into account the hierarchy constraint, and is able to process DAG hierarchies, and compare it to other decision tree approaches for HMC (which mostly learn a tree for each class separately). We show that our method outperforms previously published results on functional genomics tasks. Moreover, we can further increase the predictive performance by upgrading our method to an ensemble technique, if the user is willing to (partly) give up on interpretability.