Download PDF

Predictive Quantitative Structure-Activity Relationship Models and their use for the Efficient Screening of Molecules (Automatisch leren van structuur-activiteitsrelaties met hoge voorspellende kracht en hun toepassing bij het efficiënt screenen van moleculen)

Publication date: 2011-08-30

Author:

De Grave, Kurt
De Raedt, Luc ; Ramon, Jan

Abstract:

We explore two avenues where machine learning can help drug discovery: predictive models of in vivo or in vitro effects of molecules (known as Quantitative Structure-Activity Relationship or QSAR models), and the selection of efficient experiments based on such models.In the first part, we present methods to improve the predictive power of graph kernel based molecule classifiers. The bias of existing graph kernels can be improved by augmenting atom-bond graphs with functional groups. This novel representation allows a machine learning algorithm to use both high-level functional and low-level atomic information, without any change to the kernel or learning algorithm. In internal validation tests, we observe consistently higher AUROCs for all tested kernels.We also introduce a novel, efficient graph kernel called the Neighborhood Subgraph Pairwise Distance Kernel. The feature space of this kernel is the space of pairs of topological balls and the interpair distance. Using this kernel, a standard support vector machine outperforms existing methods in the prediction of all investigated target properties: mutagenicity, in vivo toxicity, antiviral activity, and cancer suppression.In the second part, we tackle the problem of efficient experimentation in drug discovery using optimization assisted by a learned surrogate model and we evaluate different experiment selection strategies. The algorithm is extended to accommodate drug discovery needs, such as the selection of many parallel experiments. The algorithm is integrated in an automated drug discovery platform, the robot scientist Eve. It is also applied to the optimization of the design of nanofiltration membranes.