From Multi-Channel Vision Towards Active Exploration

Chumerin, Nikolay; Van Hulle, Marc

Author:

Chumerin, Nikolay

Van Hulle, Marc

Keywords:

Independent motion detection; object recognition; vergence control; version control; biologically-motivated models

Abstract:

This thesis is a collection of three studies investigating the multi-channel processing of visual information in biologically-inspired computer vision systems. These three studies are interconnected and supported by an auxiliary work on object recognition.The first study (Chapter 2) is focused on a biologically-inspired multichannel vision approach to independent motion detection (IMD). The goal is to detect objects that move independently from the moving observer. For example, a video camera mounted in a car "sees" a constantly moving environment while the car is driving. In this case, the motion (perceived by the camera) is caused by the self-motion of the car and the independent motion of other objects (e.g., vehicles or pedestrians). The task then is to differentiate the independently moving object (IMOs) from the motion induced by the moving observer in the (static with respect to Earth) environment. In this chapter we propose an approach for IMD, which uses several channels extracted from the input visual stream to create a so-called independent motion (IM) map, which is a map where the intensity of each pixel encodes the likelihood of the pixel being a part of an IMO. Several extensions of the proposed IMD model are presented and described in this study. All these extended models involve an additional appearance-based object recognition channel, which is used to upgrade the representation of the detected independent motion from the pixel-based formto the object-based (set of IMO locations and descriptions) one.In the second study (Chapter 4) we move from the passive exploration of the surrounding world, addressed in the previous study, towards an active exploration. By the active exploration here we mean the ability of the system to move (or, more precisely, rotate) both cameras of the considered stereo setup. As a first step towards a complete active exploration scenario, we considered its simplified case of horizontal vergence control (VC). The goal of the latter is to verge both cameras on the target object. By vergence here we mean the horizontal (pan) rotation of both cameras in opposite directions, which brings the fixation point (intersection of the cameras' optical axes) onto the surface of the target object. The considered here vergence requires only horizontal rotation of both cameras, which can be easily modeled on the given (pan-tilt) robotic head by a symmetric pan-rotation of both cameras in opposite directions, while keeping the common tilt angle fixed. In Chapter 4 we propose and evaluate two neural models for vergence control. Both models use input stereo images to estimate the desired vergence angle (the angle between cameras' optical axes). The first model assumes that the gaze direction of the robotic head is orthogonal to the baseline and that the stimulus is a frontoparallel plane orthogonal to the gaze direction. The second model goes beyond these assumptions and operates reliably in the general case where all restrictions on the orientation of the gaze, as well as the target position, type and orientation, are dropped.In the third study (Chapter 5) we go to the next level of active exploration hierarchy by considering vergence and version eye movements. By the version eye movement we consider the rotational movements of both eyes in the same direction. In this chapter, we propose a novel model, called vergence-version control with attention effects (VVCA), where object recognition is used as a channel for controlling version/vergence eye movements in a biologically-plausible way. Besides purely theoretical (simulated) results, the proposed VVCA model has a real-world embodiment in the form of a robotic setup, working under real-time control of VVCA model, which was adapted specifically for this case (real-time performance).We have also extensively worked on object recognition, the results of which have been employed in all of the studies mentioned. For appearance-based object recognition (used in IMD and VC studies) we involve the well-known recognition paradigm - the convolutional neural network (CNN). In Chapter 3 we present and describe an extended version of CNN, called myCNN, which can be regarded as a fusion of a conventional CNN with hierarchical cortex-like mechanisms.