ITEM METADATA RECORD
Title: A Computational Framework for Prioritization of Disease-causing Mutations
Other Titles: Een computationeel raamwerk voor de rangschikking van ziekteveroorzakende mutaties
Authors: Popovic, Dusan
Issue Date: 6-Nov-2014
Abstract: Approximately eight percent of total population is affected by one of more than seven thousand identified genetic disorders. Causes of many of these disorders are poorly understood, which complicates disease management and, in some cases, increases morbidity and mortality. At the same time, rapid development of high-throughput technologies in the past few decades gave a considerable boost to the biomarker discovery in general. Among these techniques, the exome sequencing appears to be especially promising approach for identification of novel genes causing inheritable diseases. However, each individual genome typically harbors thousands of mutations, hence detecting the disease-causing ones remains a challenging task, even when the majority of the putatively neutral variation is filtered-out beforehand. Several computational methods have been proposed to assist this process, but most of them do not display satisfactory precision to be used in real-life environment. We propose a novel, genomic data fusion based method for prioritization of single nucleotide variants that cause rare genetic disorders. It implements several key innovations that resulted in approximately 10-fold increase in the prioritization performance compared to the rest of state-of-the-art. First, it blends together conservation scores, happloinsufficiency and various impact prediction scores, practically subsuming all the other major algorithms. Second, it is the first of its kind to fully exploit phenotype-specific information. Third, it is directly trained to distinguish rare disease-causing from rare neutral variants, instead of using common polymorphisms as a proxy. We also describe several strategies for aggregation of predictions across multiple phenotypes and explore how each of them affects the prioritization under different levels of noise. In addition, we formulate a simplified version of the model to increase the interpretability of the decision-making process, as well as to reduce a storage demand and a computational burden induced by the system. Finally, we identify a bias originating from the hierarchically granular nature of the problem's data domain and develop a sampling-based way to bypass it, which translates to a considerable additional increase of the system's performance.
Table of Contents: Abstract iii
Abbreviations vii
Contents ix
List of Figures xiii
List of Tables xv

1 Introduction 1
1.1 Two views on the in-silico biomarker discovery . . . . . . . . . 5
1.1.1 Examples of biomarker discovery by feature selection . . 7
1.1.2 Examples of biomarker discovery by classification . . . . 10
1.1.3 Strengths and weaknesses of two paradigms . . . . . . . 11
1.2 Prioritization of the disease-causing mutations . . . . . . . . . 13
1.2.1 Limitations of the existing methods . . . . . . . . . . . 14
1.2.2 Main research goals addressed in the thesis . . . . . . . 16
1.3 Organizational context of the research project . . . . . . . . . . 20
1.4 Structure of the thesis text . . . . . . . . . . . . . . . . . . . . 24

2 Variant prioritization by genomic data fusion 29
2.1 Main . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.1 Data generation . . . . . . . . . . . . . . . . . . . . . . 34
2.2.2 Classifier benchmarks . . . . . . . . . . . . . . . . . . . 35
2.2.3 Control-set benchmarks . . . . . . . . . . . . . . . . . . 37
2.2.4 Temporal stratification analysis . . . . . . . . . . . . . . 39
2.2.5 Feature importance analysis . . . . . . . . . . . . . . . . 41
2.3 Supplementary material . . . . . . . . . . . . . . . . . . . . . . 42
2.3.1 Supplementary Note 1 – Data generation . . . . . . . . 42
2.3.2 Supplementary Note 2 - Performance measures and their
interpretation . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.3 Supplementary figures and tables . . . . . . . . . . . . . 46

3 Aggregation of prioritization scores across phenotypes 55
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.1 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.2 Non-parametric Order Statistics . . . . . . . . . . . . . 59
3.2.3 Parametric Modeling . . . . . . . . . . . . . . . . . . . . 60
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4 Interpretation of the model 67
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.1 The method . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.2 The data . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . 73
4.4 Concludions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Model improvements 77
5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2.1 Hierarchical sampling . . . . . . . . . . . . . . . . . . . 81
5.2.2 Experiments with synthetic data . . . . . . . . . . . . . 83
5.2.3 Experiments with eXtasy data . . . . . . . . . . . . . . 85
5.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . 86
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6 Conclusions 93
Bibliography 99
Curriculum vitae 113
List of publications 115
ISBN: 978-94-6018-905-0
Publication status: published
KU Leuven publication type: TH
Appears in Collections:ESAT - STADIUS, Stadius Centre for Dynamical Systems, Signal Processing and Data Analytics

Files in This Item:
File Status SizeFormat
DP_final_thesis_text.pdf Published 9003KbAdobe PDFView/Open Request a copy

These files are only available to some KU Leuven Association staff members

 




All items in Lirias are protected by copyright, with all rights reserved.