In contrast to statistical visual recognition, relational visual recognition aims at employing relational representations for solving visual recognition problems. For high-level tasks involving complex objects and scenes, low- and mid-level visual features do not always suffice. In these cases it is the component objects, their structure and semantic configuration that guides recognition. They are best described in terms of relational languages or (higher-order) graphs. Relational approaches enjoyed popularity in the early vision work. Convenient at that time given the limitations of the hardware, data, scientific technologies and low-level vision routines, relational representations are rarely used in visual recognition today. This is mainly due to their pure symbolic nature. Nevertheless, recent successes in combining them with statistical learning principles and the maturity of the aforementioned resources motivates us to reinvestigate their use. Starting from low-and mid-level solutions and building on top of them, (statistical) relational learning gives the perspective of moving towards more general, complete and effective relational visual recognition systems.The thesis makes several contributions in this direction. We first introduce a new relational distance-based framework for hierarchical image understanding. Applied to the house facade domain, the relational distance shows good detection results, while demonstrating the interplay between structural and appearance- based aspects. The second contribution is the use of a kernel-based relational language for scene classification and tagging. Part of this contribution is the employment of the kernel-based language to understand images of houses. These recognition tasks use a similar relational representation and language, showing its generality and benefits. Our third contribution is a probabilistic logic pipeline for task-dependent robot grasping. It contains a new module based on causal probabilistic logic and symbolic object parts, such that, given a set of probabilistic observations about the world, it can semantically reason about object category, suitable tasks and pre-grasp configurations with respect to the intended task. Experimental results, including those obtained with a real robot platform, confirm the importance of high-level reasoning and world-knowledge for robot grasping, as opposed to using solely local object shape information. Further, in the context of robot grasping, our fourth contribution is a relational approach to numerical feature pooling. It combines numerical shape features, qualitative spatial relations and kernels for graphs to recognize graspable object points. Finally, we contribute with the use of sequential statistical relational techniques to capture underlying concepts in video streams. In particular, we focus on monitoring card games and learning to detect fraudulent sequences.Overall, the experimental results provide evidence that we can develop effective and real-world relational visual recognition systems that benefit from statistical relational learning.