Download PDF Download PDF

International Journal of Corpus Linguistics

Publication date: 2013-01-01
Volume: 18 Pages: 536 - 560
Publisher: John Benjamins Pub. Co.

Author:

Bertels, Ann
Speelman, Dirk

Keywords:

Social Sciences, Linguistics, Language & Linguistics, keyword extraction, probability value, typicality coefficient, reference corpus, 1702 Cognitive Sciences, 2004 Linguistics, Languages & Linguistics, 4703 Language studies, 4704 Linguistics

Abstract:

This paper explores two tools and methods for keyword extraction. As several tools are available, it makes a comparison of two widely used tools, namely Lexico3 (Lamalle et al. 2003) and WordSmith Tools (Scott 2013). It shows the importance of keywords and discusses recent studies involving keyword extraction. Since no previous study has attempted to compare two different tools, used by different language communities and which use different methodologies to extract keywords, this paper aims at filling the gap by comparing not only the tools and their practical use, but also the underlying methodologies and statistics. By means of a comparative study on a small test corpus, this paper shows major similarities and differences between the tools. The similarities mainly concern the most typical keywords, whereas the differences concern the total number of significant keywords extracted, the granularity of both probability value and typicality coefficient and the type of the reference corpus. © 2013 John Benjamins Publishing Company.