skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach

BMC bioinformatics, 2020-09, Vol.21 (Suppl 13), p.1-384, Article 384 [Peer Reviewed Journal]

COPYRIGHT 2020 BioMed Central Ltd. ;2020. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;The Author(s) 2020 ;ISSN: 1471-2105 ;EISSN: 1471-2105 ;DOI: 10.1186/s12859-020-03675-3 ;PMID: 32938375

Full text available

Citations Cited by
  • Title:
    Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach
  • Author: Pan, Yuliang ; Zhou, Shuigeng ; Guan, Jihong
  • Subjects: Algorithms ; Benchmarks ; Binding ; Classifiers ; Computer applications ; Datasets ; Deoxyribonucleic acid ; DNA ; DNA binding ; Ensemble stacking classifier ; Feature selection ; Free energy ; Gene expression ; Hot spots ; Hydrogen bonds ; Interfaces ; Learning algorithms ; Machine learning ; Methods ; Mutation ; Protein binding ; Protein-DNA complexes ; Proteins ; Redundancy ; Residues ; Sensitivity ; Solvents
  • Is Part Of: BMC bioinformatics, 2020-09, Vol.21 (Suppl 13), p.1-384, Article 384
  • Description: Abstract Background Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots , which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods. Results Here, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Pre dicting Hot s pots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones. Conclusions PreHots , which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/ .
  • Publisher: London: BioMed Central Ltd
  • Language: English
  • Identifier: ISSN: 1471-2105
    EISSN: 1471-2105
    DOI: 10.1186/s12859-020-03675-3
    PMID: 32938375
  • Source: TestCollectionTL3OpenAccess
    GFMER Free Medical Journals
    PubMed Central
    Springer Nature OA/Free Journals
    ROAD: Directory of Open Access Scholarly Resources
    ProQuest Central

Searching Remote Databases, Please Wait