Download PDF

Algorithms for Temporal Information Processing of Text and their Applications (Algoritmen voor het verwerken van temporele informatie in tekst en hun toepassingen)

Publication date: 2012-03-09

Author:

Kolomiyets, Oleksandr

Abstract:

Temporal information processing of text is a complex information extractiontask in which temporally relevant information in text has to be extracted andproperly represented in order to be used by a machine. In general the temporalinformation processing task regards the major concepts of temporal cognitionsuch as time, events, and relations between events and times when they areencoded in language.This thesis explores the algorithms for temporal information processing of textand focuses on the automated extraction of temporal information. Three majortemporal concepts in language are identified: time expressions - expressionsin text that denote time, temporal events - events that happen or last intime, and temporal relations between events and times. With respect tothis distinction temporal information processing of text can be divided into anumber of corresponding sub-tasks, such as recognition and normalization oftime expressions, recognition of events, and recognition of temporal relationsbetween events and times. In this thesis we describe approaches for automatedrecognition and representation of times, events, and recognition of temporalrelations performed by means of computer algorithms. The proposed algorithmsare based on supervised statistical machine learning methods that sometimesare accompanied by symbolic rule-based approaches.In detail, the thesis contributes (i) the supervised learning algorithms fortemporal expression recognition in text with sparse training data and abootstrapping approach that addresses the sparsity problem, (ii) the novelparadigm for modularized normalization of temporal expressions based on a deepsemantic analysis of temporal expression constituents, (iii) the novel annotationparadigm of temporal information that aims at a full and coherent set ofannotated temporal relations, but does not require annotating an exhaustive setof temporal relations, and (iv) the novel algorithms for the temporal documentstructure recognition composed of temporal events and temporal relations whichis an important step towards automated story understanding.