Download PDF

CW Reports

Publication date: 2013-11-01
Publisher: Department of Computer Science, KU Leuven; Leuven, Belgium

Author:

Zimmermann, Albrecht

Abstract:

Data for training a classification model can be considered to consist of two types of points: easy to classify ones — typical for a class — and difficult to classify ones — atypical for a class and often lying on class boundaries. Most existing techniques deal with atypical points in later stages of model building, after typical points have been modeled. This means that atypical points are often modeled only if doing so results in an improvement in comparison to the model of typical points. An alternative way of viewing atypical points is as outliers w.r.t. the class to which they supposedly belong. Based on this realization, we introduce the concept of class outliers, whose immediate neighborhoods we use to construct discriminative features. We investigate ways of employing the newly derived features and compare the quality of resulting models with results on un-augmented data for a variety of UCI benchmarks sets. We find that while some overfitting control can be necessary, the newly derived features improve the classification accuracy of SVM, Naive Bayes, and C4.5 classifiers.