Title: Heterogeneous information sources for bioinformatics: integration methodology, search algorithms and case studies
Other Titles: Heterogene informatiebronnen voor bio-informatica: integratiemethodologie, zoekalgoritmen en gevallenstudies
Authors: Bonachela Capdevila, Francisco
Issue Date: 4-Oct-2012
Abstract: Identifying the genetic basis associated with Mendelian disorders or complex phenotypes is essential in human genetics in order to design more effective and eventually to better understand the molecular mechanisms behind these genetic disorders.Usually, a list of candidates is obtained in a high-thoughput experiment, such as a genomewide association study. This set of genes (either a chromosomal region or a list of genes scattered in the genome) is usually not small enough to easily undertake a manually one-by-one validation and therefore a selection of the putative most interesting genes is needed. This problem has been named gene prioritization and in the last years, several computing based approaches have been proposed to cope with it. This thesis presents a work on gene prioritization.The first part of this text thoroughly reviews the web based gene prioritization tools that can be freely used by any user. We describe seventeen tools and we stress their similarities and differences with the aim to help the user to choose the most appropriate one for his type of data. We have also reviewed the bibliography associated with these tools in search of validations and tool performance comparisons and we have finally set up a website where this information and regular updates are stored. In the last two years, the number of tools described in the website has almost doubled.Furthermore, we have developed a performance review among gene prioritization tools, both using the whole genome as starting candidate set or a limited one. We have compared individual results with the combination of the tools and finally we have completed our review with the combination of the best performance gene prioritization tools in our benchmark in three real life experiments. All the expertise gathered in our complete review has been used to find new candidate genes involved in congenital heart disease, congenital diaphragmatic hernia and asthma.Finally, we propose the use of cluster analysis as a preprocessing step of gene prioritization approaches that use training genes to lead the prioritization. We claim that the automatic selection of a homogenous training set produces more accurate rankings than the expert selected ones. To this purpose, we have applied a transactional clustering algorithm, CLOPE, to two different gene prioritization tools: Endeavour and Genedistiller.
Table of Contents: Contents iv
1. Introduction 1
1.1. Human Genetics 1
1.2. Bioinformatics 3
1.3. Gene Prioritization 4
1.3.1. Candidate Set 5
1.3.2. Training Set 6
1.4. Cluster Analysis 6
1.4.1. Types of data 7
1.4.2. Traditional clustering approaches 8
1.4.3. Categorical clustering 9
1.4.4. Transactional clustering 11
1.4.5. Conclusion 12
1.5. Aims and objectives 13
1.6. Structure of the thesis and personal contribution 13
2. A guide to web tools to prioritize candidate genes 15
3. An unbiased evaluation of gene prioritization tools 35
4. Combination of gene prioritization tools gives an insight into disease gene discovery 67
5. A clustering based preprocessing method for gene prioritization 105
6. Conclusion 127
6.1. Overview 128
6.2. Clustering analysis and gene prioritization 129
6.3. Other lines of research 130
6.3.1. Haematlas 130
6.3.2. Daphnia and biclustering 133
7. Appendix A 135
8. Appendix B 139
9. Bibliography 147
10. List of publications 155
11. Curriculum vitae 159
Publication status: published
KU Leuven publication type: TH
Appears in Collections:Computer Science, Campus Kulak Kortrijk
Chemistry, Campus Kulak Kortrijk
ESAT - STADIUS, Stadius Centre for Dynamical Systems, Signal Processing and Data Analytics

Files in This Item:
File Status SizeFormat
Manuscript_BonachelaCapdevila.pdf Published 3530KbAdobe PDFView/Open Request a copy

These files are only available to some KU Leuven Association staff members


All items in Lirias are protected by copyright, with all rights reserved.