Title: Using biased discriminant analysis for email filtering
Authors: Gomez, Juan Carlos
Moens, Marie-Francine
Issue Date: 2010
Publisher: Springer
Host Document: Lecture Notes in Computer Science vol:6276 pages:566-575
Conference: 14th international conference on knowledge-based and intelligent information & engineering systems location:Cardiff, Wales, UK date:8-10 September 2010
Abstract: This paper reports on email filtering based on content features. We test the validity of a novel statistical feature extraction method, which relies on dimensionality reduction to retain the most informative and discriminative features from messages. The approach, named Biased
Discriminant Analysis (BDA), aims at finding a feature space transformation that closely clusters positive examples while pushing away the negative ones. This method is an extension of Linear Discriminant Analysis (LDA), but introduces a different transformation to improve the separation between classes and it has up till now not been applied for
text mining tasks.
We successfully test BDA under two schemas. The first one is a traditional classification scenario using a 10-fold cross validation for four ground truth standard corpora: LingSpam, SpamAssassin, Phishing corpus and a subset of the TREC 2007 spam corpus. In the second schema we test the anticipatory properties of the statistical features with the TREC 2007 spam corpus.
The contributions of this work is the evidence that BDA offers better discriminative features for email filtering, gives stable classification results notwithstanding the amount of features chosen, and robustly retains their discriminative value over time.
ISSN: 0302-9743
Publication status: published
KU Leuven publication type: IC
Appears in Collections:Informatics Section

Files in This Item:
File Description Status SizeFormat
Gomez-BDAEmailFiltering.pdfMain article Published 307KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science