IEEE Transactions on Information Forensics and Security vol:5 issue:4 pages:837-847
Hidden salting in digital media involves the intentional addition or distortion of content patterns with the purpose of misleading content filters.
We propose a method for detecting portions of a digital text source which are invisible to the end user, when they are rendered on a visual medium (like a computer monitor). The method consists of "tapping" into the rendering process and analyzing the rendering commands to
identify portions of the source text (plaintext) which will be invisible for a human reader, using criteria based on text character and background colours, font size, overlapping characters, etc. Moreover, the text deemed visible (covertext) is reconstructed from rendering commands and the character reading order is identified, which could differ from the rendering order. The detection and resolution of hidden salting is evaluated on two email corpora, and the effectiveness of this method in a spam filtering task is assessed. We provide a solution to a relevant open problem in content filtering applications, namely the presence of tricks aimed at circumventing automatic filters.