4th International Conference (IWBBIO 2016), Date: 2016/04/01 - 2016/04/01, Location: Spain
Lecture Notes in Computer Science
Author:
Keywords:
SISTA, Science & Technology, Life Sciences & Biomedicine, Technology, Biochemical Research Methods, Computer Science, Information Systems, Engineering, Biomedical, Mathematical & Computational Biology, Biochemistry & Molecular Biology, Computer Science, Engineering, ONTOLOGY, TOOL
Abstract:
© Springer International Publishing Switzerland 2016. Text mining is popular in biomedical applications because it allows retrieving highly relevant information. Particularly for us, it is quite practical in linking diseases to the genes involved in them. However text mining involves multiple challenges, such as (1) recognizing named entities (e.g., diseases and genes) inside the text, (2) constructing specific vocabularies that efficiently represent the available text, and (3) applying the correct statistical criteria to link biomedical entities with each other. We have previously developed Beegle, a tool that allows prioritizing genes for any search query of interest. The method starts with a search phase, where relevant genes are identified via the literature. Once known genes are identified, a second phase allows prioritizing novel candidate genes through a data fusion strategy. Many aspects of our method could be potentially improved. Here we evaluate two MEDLINE annotators that recognize biomedical entities inside a given abstract using different dictionaries and annotation strategies. We compare the contribution of each of the two annotators in associating genes with diseases under different vocabulary settings. Somewhat surprisingly, with fewer recognized entities and a more compact vocabulary, we obtain better associations between genes and diseases. We also propose a novel but simple association criterion to link genes with diseases, which relies on recognizing only gene entities inside the biomedical text. These refinements significantly improve the performance of our method.