Title: Top-down clustering for protein subfamily identification
Authors: De Paula Costa, Eduardo * ×
Vens, Celine *
Blockeel, Hendrik #
Issue Date: 2013
Series Title: Evolutionary Bioinformatics Online vol:9 pages:185-202
Abstract: We propose a novel method for the task of protein subfamily identification, that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up, and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology, with more accurate subfamily identification as a result, and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that, alone, allow for classifying new sequences with an accuracy approaching that of hidden Markov models.
ISSN: 1176-9343
Publication status: published
KU Leuven publication type: IT
Appears in Collections:Informatics Section
Public Health and Primary Care, Campus Kulak Kortrijk
Department of Public Health miscellaneous
* (joint) first author
× corresponding author
# (joint) last author

Files in This Item:
File Description Status SizeFormat
MLPaperProteinSubfamily.pdfpre-print Accepted 378KbAdobe PDFView/Open
NJeditedEnolaseLastMerge.jpegFigure S3 (supplemental material) Accepted 2043KbJPEGView/Open
clusTreeEnolaseNumbers.jpegFigure S4 (supplemental material) Accepted 3595KbJPEGView/Open
sciphyTreeEnolase.jpegFigure S5 (supplemental material) Accepted 2943KbJPEGView/Open
clusTreeEditedTreeEnolase.jpegFigure S1 (supplemental material) Accepted 1465KbJPEGView/Open
sciphyeditedEnolase.jpegFigure S2 (supplemental material) Accepted 1416KbJPEGView/Open


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science