Language:

Machine learning classification can reduce false positives in structure-based virtual screening

Proceedings of the National Academy of Sciences - PNAS, 2020-08, Vol.117 (31), p.18477-18488 [Peer Reviewed Journal]

Copyright National Academy of Sciences Aug 4, 2020 ;Copyright © 2020 the Author(s). Published by PNAS. 2020 ;ISSN: 0027-8424 ;EISSN: 1091-6490 ;DOI: 10.1073/pnas.2000585117 ;PMID: 32669436

Full text available

Citations Cited by

Actions
1. Add to My Research
2. Remove from My Research
3. E-mail
4. Print
5. Permalink
6. Citation
7. EasyBib
8. EndNote
9. RefWorks
10. Delicious
11. Export RIS
12. Export BibTeX

Title:
Machine learning classification can reduce false positives in structure-based virtual screening
Author: Adeshina, Yusuf O. ; Deeds, Eric J. ; Karanicolas, John
Subjects: Acetylcholinesterase ; Benchmarks ; Biological Sciences ; Classifiers ; Computer applications ; Datasets ; Learning algorithms ; Machine learning ; Optimization ; Overtraining ; Physical Sciences ; Screening ; Training
Is Part Of: Proceedings of the National Academy of Sciences - PNAS, 2020-08, Vol.117 (31), p.18477-18488
Description: With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplisticmodels and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC50 better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC50 280 nM, corresponding to Ki of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.
Publisher: Washington: National Academy of Sciences
Language: English
Identifier: ISSN: 0027-8424
EISSN: 1091-6490
DOI: 10.1073/pnas.2000585117
PMID: 32669436
Source: Freely Accessible Journals
PubMed Central

Back to results list


INSPIRE LIBRARY - TON DUC THANG UNIVERSITY	(84-028) 37 755 057	Feedback
19 Nguyen Huu Tho St. Dist.7, HCM	thuvien@tdtu.edu.vn	Feedback

Machine learning classification can reduce false positives in structure-based virtual screening

Copyright National Academy of Sciences Aug 4, 2020 ;Copyright © 2020 the Author(s). Published by PNAS. 2020 ;ISSN: 0027-8424 ;EISSN: 1091-6490 ;DOI: 10.1073/pnas.2000585117 ;PMID: 32669436

Searching Remote Databases, Please Wait