Download PDF

Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition

Publication date: 2024-06-19


Radevski, Gorjan
Tuytelaars, Tinne ; Moens, Marie-Francine




In this thesis, we delve into the complexities of multimodal machine learning; focusing on alignment, translation, fusion, and transference. We introduce novel methods for translating text-based spatial relations into 2D spatial arrangements, and translating medical texts into specific 3D locations within a human anatomical atlas. Furthermore, we develop a benchmark dataset for translating structured text to canonical facts in large-scale knowledge graphs. Finally, we explore multimodal fusion methods for improved action recognition, and demonstrate a pragmatic approach for multimodal knowledge transference. The research we conduct significantly enhances machine understanding and recognition capabilities across various applications