Download PDF

Structured Machine Learning for Mapping Natural Language to Spatial Ontologies (Gestructureerd machinaal leren voor het omzetten van natuurlijke taal naar ruimtelijke ontologieën)

Publication date: 2013-07-01

Author:

Kordjamshidi, Parisa
Moens, Marie-Francine

Keywords:

Structured machine learning, Spatial information extraction, Natural language processing, ontology population, Spatial role labeling, Structured output prediction, Spatial meaning representation, global inference

Abstract:

Natural language understanding is one of the fundamental goals of artificial intelligence. An essential function of natural language is to talk about the location, and translocation of objects in space. Understanding spatial language is important in many applications such as geographical information systems, human computer interaction, the provision of navigational instructions to robots, visualization or text-to-scene conversion, etc.Due to the complexity of spatial primitives and notions, and the challenges of designing ontologies for formal spatial representation, the extraction of the spatial information from natural language still has to be placed in a well-defined framework. Machine learning has not systematically been applied to the task, and no established corpora are available. In this thesis I study the problem from cognitive, linguistics and computational points of view, with a primary focus on establishing a supervised machine learning framework.This thesis makes five main research contributions. The first is the design of a spatial annotation scheme to bridge between natural language and formal spatial representations. In this scheme the universal and commonly accepted cognitive spatial notions and multiple well-known qualitative spatial reasoning models are applied.The second is the definition of a novel computational linguistic task that utilizes the annotation scheme to map natural language to spatial ontologies. For this task I have built rich annotated corpora and an evaluation scheme.The third is a detailed investigation of the linguistic features and structural characteristics of spatial language that aid the use of machine learning in extracting spatial roles and relations from annotated data. The learning methods used are discriminative graphical models and statistical relational learning.The fourth is the proposal of a unified structured output learning model for ontologies. The ontology components are learnt while taking into account the ontological constraints and linguistic dependencies among the components. The ontology includes roles and relations, and multiple formal semantic types. The fifth is the proposal of an efficient inference approach based upon constraint optimization. It can deal with a large number of variables and constraints, and makes building a global structured learning model for ontology population, feasible. To test the approach I have performed an empirical investigation using my spatial ontology.The application of my proposed unified learning model for ontology population is not limited to the extraction of spatial semantics, it could be used to populate any ontology. I argue therefore that this work is an important step towards automatically describing text with semantic labels that form a structured ontological representation of the content.