Active speaker detection with audio-visual co-training

Chakravarty, Jay; Zegers, Jeroen; Tuytelaars, Tinne; Van hamme, Hugo; Nakano, YI; Andre, E; Nishida, T; Busso, C; Pelachaud, C

doi:10.1145/2993148.2993172

Proceedings ICMI 2016

Active speaker detection with audio-visual co-training

Author:

Chakravarty, Jay

Zegers, Jeroen ; Tuytelaars, Tinne ; Van hamme, Hugo ; Nakano, YI ; Andre, E ; Nishida, T ; Busso, C ; Pelachaud, C

Keywords:

PSI_VISICS, PSI_SPEECH, Science & Technology, Technology, Computer Science, Artificial Intelligence, Computer Science, Theory & Methods, Computer Science, Active Speaker Detection, Audio-visual Co-training, RECOGNITION, PSI_4141

Abstract:

© 2016 ACM. In this work, we show how to co-Train a classifier for active speaker detection using audio-visual data. First, audio Voice Activity Detection (VAD) is used to train a personalized video-based active speaker classifier in a weakly supervised fashion. The video classifier is in turn used to train a voice model for each person. The individual voice models are then used to detect active speakers. There is no manual supervision -Audio weakly supervises video classification, and the co-Training loop is completed by using the trained video classifier to supervise the training of a personalized audio voice classifier.

Proceedings ICMI 2016 Active speaker detection with audio-visual co-training

Author:

Keywords:

Abstract:

Proceedings ICMI 2016

Active speaker detection with audio-visual co-training