Using Open-Source Tools in Support of Neglected Diseases Drug Discovery
Using Open-Source Tools in Support of Neglected Diseases Drug Discovery


We present our efforts generating and validating multiple predictive machine learning models (e.g., random forests, k-nearest neighbors, support vector machines, self-organizing maps, naïve Bayes) in support of neglected tropical diseases drug discovery. Screening data, retrieved from ChEMBL-NTD (, was used to construct and validate the models. Programs written using the R software ecosystem ( and the Python software ecosystem ( were used for data retrieval, curation, visualization, analysis, mining, and reporting. End-to-end workflows using both of these software ecosystems will be presented. We demonstrate how one might access these models using Shiny, a web application framework for R (, and Jupyter notebooks ( Each of the models is collected into a compendium, a ‘container’ for all those elements that make up a model and its associated description: the primary data, the annotated computational code, figures, tables, and derived data together with textual documentation and conclusions. These compendia are meant to enable the practice of reproducible research. One may re-run the analyses; run the analyses with new data sets; modify the code for other purposes. The primary purpose of this work is to make the functionality of the R scripts and Python scripts (i.e., the predictive models) available to interested parties, regardless of their knowledge of R or Python. Free and open access to predictive models supporting neglected diseases drug discovery is meant to complement the research activities of all investigators, and in particular, those with limited access to computational tools and algorithms.


Paul received his PhD in Physical Chemistry from Rensselaer Polytechnic Institute; received his Postdoctoral fellowship with IBM Data Systems Division; previously worked as a computational chemist (QSAR, QSPR, ligand-based and structure-based pharmacophore development, cheminformatics) at Sterling Winthrop Research Institute, Procept, Pfizer and Scynexis; and is currently the Senior Data Scientist at Solvay.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google