Download PDF

6th Information Retrieval Facility Conference (IRFC 2013), Date: 2013/10/07 - 2013/10/09, Location: Limassol, Cyprus

Publication date: 2013-10-01
Volume: 8201 Pages: 45 - 57
ISSN: 978-3-642-41057-4
Publisher: Springer; Berlin

Lecture Notes in Computer Science

Author:

Shrestha, Niraj
Vulic, Ivan ; Moens, Marie-Francine ; Lupu, M ; Kanoulas, E ; Loizides, F

Keywords:

Named entity recognition, Term expansion, Broadcast news, Speech data, Science & Technology, Technology, Computer Science, Information Systems, Computer Science, Theory & Methods, Computer Science, term expansion, broadcast news, speech data, SPEECH

Abstract:

We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we automatically detect document alignments between highly similar speech documents and corresponding written news stories that are easily obtainable from the Web; (2) we employ term expansion techniques commonly used in information retrieval to recover named entities that were initially missed by the speech transcriber.We show that our method is able to find named entities missing in the transcribed speech data, and additionally to correct incorrectly assigned named entity tags. Consequently, our novel approach improves state-of-the-art NER results from speech data both in terms of recall and precision.