Title: Information extraction in structured documents using tree automata induction
Authors: Kosala, Raymondus ×
Van den Bussche, Jan
Bruynooghe, Maurice
Blockeel, Hendrik #
Issue Date: 2002
Publisher: Springer
Host Document: Lecture notes in computer science vol:2431 pages:299-310
Conference: 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2002) location:Helsinki, Finland date:August 19-23, 2002
Abstract: Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, such documents have a tree structure. Hence it is natural to investigate methods that are able to recognise and exploit this tree structure. We do this by exploring the use of tree automata for IE in structured documents. Experimental results on benchmark data sets show that our approach compares favorably with previous approaches.
ISSN: 0302-9743
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section
× corresponding author
# (joint) last author

Files in This Item:
File Status SizeFormat
38956.pdf Published 226KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.