Literature-based concept profiles for gene annotation: the issue of weighting
Jelier, Rob × Schuemie, Martijn J Roes, Peter-Jan van Mulligen, Erik M Kors, Jan A #
Elsevier Science Ireland Ltd.
International Journal of Medical Informatics vol:77 issue:5 pages:354-62
Text-mining has been used to link biomedical concepts, such as genes or biological processes, to each other for annotation purposes or the generation of new hypotheses. To relate two concepts to each other several authors have used the vector space model, as vectors can be compared efficiently and transparently. Using this model, a concept is characterized by a list of associated concepts, together with weights that indicate the strength of the association. The associated concepts in the vectors and their weights are derived from a set of documents linked to the concept of interest. An important issue with this approach is the determination of the weights of the associated concepts. Various schemes have been proposed to determine these weights, but no comparative studies of the different approaches are available. Here we compare several weighting approaches in a large scale classification experiment.