The main challenge of generic object classification - i.e. determiningthat an instance of one or the other object class is present - is thatobjects of such classes largely vary in appearance. Such differencescan possibly be small compared to such differences with othercategories. A successful classifier requires being invariant to theintra-class variability while being discriminant to the inter-classdifferences. The thesis contributes to making the relevant differencescount.This thesis proposes to improve generic object classification byintroducing more flexible and richer representations to model certaintypes of variations. In particular, it learns and modelsthe variations in spatial location, size and appearance of objects,and also interactions with other object categories and theirsurroundings. This way, we can learn to distinguish the intra-classand inter-class variations better when only given class labels,and that helps to improve visual object classification.In the first part of the thesis, we address the variability in spatiallocation and size of objects by introducing a novel objectrepresentation that adds spatial information to the standard bag ofwords representation. We formulate our method in a general setting asinferring additional unobserved or latent dependent parameters. Inparticular, we focus on two such types of parameters: The first typespecifies a cropping operation. This determines a bounding box in theimage. This box serves to eliminate non-representative object partsand background. The second type specifies a splitting operation. Itcorresponds to a non-uniform image decomposition into 4 quadrants,i.e. as a generalization of pyramidal bag-of-words.In addition to variability in their spatial configuration, objects inthe same category can differ in their parts and background. In thesecond part, we propose an object classification method that betterhandles the complexity of real world images by jointly learning andlocalizing not only the object, but also a crude layout of itsconstituent parts as well as the background. We consider the object ofinterest as a composition of parts that can be placed together tobetter model its visual appearance. Furthermore, once the object (orforeground) is localized we also model the background as a compositionof constituent parts. In order to enforce coherence in the models andbetter cope with appearance noise, we also learn pairwiserelationships between adjacent parts. This permits us to avoidunlikely part configurations and therefore avoid false positiveresponses.In the third part, we focus on learning the inter-class differencesbetween visually similar object categories. We show that jointlylearning and localizing pairwise relations between visually similarclasses improves such classification: when having to tell whether ornot a specific target class is present, sharing knowledge about other,auxiliary classes supports this decision. In particular, we propose aframework that combines target class-specific global and localinformation with information learnt for pairs of the target class andeach of a number of auxiliary classes. Adding such pairwiseinformation helps to learn the common part and context for a classpair and discriminate against other classes.We evaluate all proposed methods on realistic datasets and comparethem against previous, related methods. Extensive experimentalevaluations show that modeling and learning the variations in spatiallocation, appearance and interactions with other object categories andtheir surroundings improve visual object classification.
Bilen H., ''Object classification with latent parameters'', Proefschrift voorgedragen tot het behalen van het doctoraat in de ingenieurswetenschappen, KU Leuven, November 2013, Leuven, Belgium.