Language Resources and Evaluation vol:49 issue:1 pages:195-214
Research on the multimodal aspects of interactional language use requires high-quality multimodal resources. In contrast to the vast amount of available written language corpora and collections of transcribed spoken language, truly multimodal corpora including visual as well as auditory data are scarce. In this paper, we first discuss a few notable exceptions that do provide high-quality and multiple-angle video recordings of face-to-face conversations. We then present a new multimodal corpus design that adds two dimensions to the existing resources. First, the recording set-up was designed in such a way as to have a full view of the dialogue partners’ gestural behaviour, including hand gestures, facial expressions and body posture. Second, by recording the participant perspective and behaviour during conversation, using head-mounted scene cameras and eye-trackers, we obtained a 3D landscape of the conversation, with detailed production information (scene camera and sound) and indices of cognitive processing (eye movements for gaze analysis) for both participants. In its current form, the resulting InSight Interaction Corpus consists of 15 recorded face-to-face interactions of 20 min each, of which five have been transcribed and annotated for a range of linguistic and gestural features, using the ELAN multimodal annotation tool.