6th Information Retrieval Facility Conference (IRFC 2013), Date: 2013/10/07 - 2013/10/09, Location: Limassol, Cyprus
Lecture Notes in Computer Science
Author:
Keywords:
Named entity recognition, Term expansion, Broadcast news, Speech data, Science & Technology, Technology, Computer Science, Information Systems, Computer Science, Theory & Methods, Computer Science, term expansion, broadcast news, speech data, SPEECH
Abstract:
We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we automatically detect document alignments between highly similar speech documents and corresponding written news stories that are easily obtainable from the Web; (2) we employ term expansion techniques commonly used in information retrieval to recover named entities that were initially missed by the speech transcriber.We show that our method is able to find named entities missing in the transcribed speech data, and additionally to correct incorrectly assigned named entity tags. Consequently, our novel approach improves state-of-the-art NER results from speech data both in terms of recall and precision.