Download PDF

Terminology and Knowledge Engineering Conference (TKE2012): New frontiers in the constructive symbiosis of terminology and knowledge engineering, Date: 2012/06/19 - 2012/06/22, Location: Madrid

Publication date: 2012-06-01
Pages: 283 - 290
ISSN: 978-84-695-4333-7
Publisher: Universidad Politecnica de Madrid; Madrid

Proceedings of the 10th Terminology and Knowledge Engineering Conference (TKE2012): New frontiers in the constructive symbiosis of terminology and knowledge engineering

Author:

De Hertog, Dirk
Heylen, Kris

Keywords:

Computational Linguistics, Terminology

Abstract:

Many approaches to term extraction focus on the extraction of multiword units, assuming that multiword units comprise the majority of terms in most subject fields. However, this supposed prevalence of multiword terms has gone largely untested in the literature. In this paper, we perform a quantitative corpus-based analysis of the claim that multiword units are more technical than single word units, and that multiword units are more widespread in specialized domains. As a case study, we look at Dutch terminology from the Belgian legal domain. First, the relevant units are extracted using linguistic filters and an algorithm to identify Dutch compounds and multiword units. In a second step, we calculate for all units an association measure that captures the degree to which a linguistic unit belongs to the domain. Thirdly, we analyze the relationship between the units' technicality, frequency and their status as a simplex, compound or multiword unit.