IEEE Transactions on Multimedia vol:12 issue:1 pages:13-27
In this paper we report on our experiments on aligning names and faces as found in images and captions of online news websites. Developing accurate technologies for linking names and faces is valuable when retrieving or mining information from multimedia collections. We perform exhaustive
and systematic experiments exploiting the (a)symmetry between the visual and textual modalities. This leads to different schemes for assigning names to the faces, assigning faces to the names, and establishing name-face link pairs. On top of that, we investigate generic approaches to the use of textual and visual structural information to predict the presence of the corresponding entity in the other modality. The proposed methods are completely unsupervised and are inspired by methods for aligning phrases and words in texts of different languages developed for constructing dictionaries for machine translation. The results are competitive with state of the art performance on the ”Labeled Faces in the Wild” dataset in terms of recall values, now reported on the complete dataset, include excellent precision values, and show the value of text and image analysis for identifying the probability of being pictured or named in the alignment process.
Pham P.T., Moens M.-F., Tuytelaars T., ''Cross-media alignment of names and faces'', IEEE transactions on multimedia, vol. 12, no. 1, pp. 13-27, January 2010.