Title: Exploring the feasability and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications
Authors: Magerman, Tom ×
Van Looy, Bart
Song, Xiaoyan #
Issue Date: Feb-2010
Publisher: Springer & Akadémiai Kiadó
Series Title: Scientometrics vol:82 issue:2 pages:289-306
Abstract: In this study, we examine and validate the use of existing text mining techniques (based on the vector space model and latent semantic indexing) to detect similarities between patent documents and scientific publications. Clearly, experts involved in domain studies would benefit from techniques that allow similarity to be detected—and hence facilitate mapping, categorization and classification efforts. In addition, given current debates on the relevance and appropriateness of academic patenting, the ability to assess content-relatedness between sets of documents—in this case, patents and publications—might become relevant and useful. We list several options available to arrive at content based similarity measures. Different options of a vector space model and latent semantic indexing approach have been selected and applied to the publications and patents of a sample of academic inventors (n = 6). We also validated the outcomes by using inde-pendently obtained validation scores of human raters. While we conclude that text mining techniques can be valuable for detecting similarities between patents and publications, our findings also indicate that the various options available to arrive at similarity measures vary 22 considerably in terms of accuracy: some generally accepted text mining options, like dimensionality reduction and LSA, do not yield the best results when working with smaller document sets. Implications and directions for further research are discussed.
ISSN: 0138-9130
Publication status: published
KU Leuven publication type: IT
Appears in Collections:Department of Managerial Economics, Strategy and Innovation (MSI), Leuven
× corresponding author
# (joint) last author

Files in This Item:
File Description Status SizeFormat
2010-04-13 - Magerman et al TXT Mining.pdf Published 364KbAdobe PDFView/Open Request a copy

These files are only available to some KU Leuven Association staff members


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science