skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier

Computers in biology and medicine, 2020-08, Vol.123, p.103899, Article 103899 [Peer Reviewed Journal]

2020 Elsevier Ltd ;Copyright © 2020 Elsevier Ltd. All rights reserved. ;2020. Elsevier Ltd ;ISSN: 0010-4825 ;EISSN: 1879-0534 ;DOI: 10.1016/j.compbiomed.2020.103899 ;PMID: 32768046

Full text available

Citations Cited by
  • Title:
    Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier
  • Author: Chen, Cheng ; Zhang, Qingmei ; Yu, Bin ; Yu, Zhaomin ; Lawrence, Patrick J. ; Ma, Qin ; Zhang, Yan
  • Subjects: Accuracy ; Algorithms ; Amino acid composition ; Amino acids ; Biological activity ; Classifiers ; Composition ; Datasets ; Decision trees ; Drug development ; Evolution ; Feature selection ; Gene expression ; Learning algorithms ; Linear programming ; Machine learning ; Methods ; Multi-information fusion ; Noise reduction ; Predictions ; Protein interaction ; Protein-protein interactions ; Proteins ; Regression analysis ; Source code ; Stacked ensemble classifier ; Support vector machines ; Test sets ; XGBoost
  • Is Part Of: Computers in biology and medicine, 2020-08, Vol.123, p.103899, Article 103899
  • Description: Protein-protein interactions (PPIs) are involved with most cellular activities at the proteomic level, making the study of PPIs necessary to comprehending any biological process. Machine learning approaches have been explored, leading to more accurate and generalized PPIs predictions. In this paper, we propose a predictive framework called StackPPI. First, we use pseudo amino acid composition, Moreau-Broto, Moran and Geary autocorrelation descriptor, amino acid composition position-specific scoring matrix, Bi-gram position-specific scoring matrix and composition, transition and distribution to encode biologically relevant features. Secondly, we employ XGBoost to reduce feature noise and perform dimensionality reduction through gradient boosting and average gain. Finally, the optimized features that result are analyzed by StackPPI, a PPIs predictor we have developed from a stacked ensemble classifier consisting of random forest, extremely randomized trees and logistic regression algorithms. Five-fold cross-validation shows StackPPI can successfully predict PPIs with an ACC of 89.27%, MCC of 0.7859, AUC of 0.9561 on Helicobacter pylori, and with an ACC of 94.64%, MCC of 0.8934, AUC of 0.9810 on Saccharomyces cerevisiae. We find StackPPI improves protein interaction prediction accuracy on independent test sets compared to the state-of-the-art models. Finally, we highlight StackPPI's ability to infer biologically significant PPI networks. StackPPI's accurate prediction of functional pathways make it the logical choice for studying the underlying mechanism of PPIs, especially as it applies to drug design. The datasets and source code used to create StackPPI are available here: https://github.com/QUST-AIBBDRC/StackPPI/. •A new method StackPPI is proposed to predict protein-protein interactions.•Fusing PAAC, AD, AAC-PSSM, Bi-PSSM and CTD to extract physicochemical, evolutionary and sequence information.•The XGBoost feature selection is employed to eliminate redundancy and retain the optimal feature subset.•We build up stacked ensemble classifier using RF, ET and LR for the first time.•StackPPI has good generalization ability on independent test sets and PPIs network datasets.
  • Publisher: United States: Elsevier Ltd
  • Language: English
  • Identifier: ISSN: 0010-4825
    EISSN: 1879-0534
    DOI: 10.1016/j.compbiomed.2020.103899
    PMID: 32768046
  • Source: AUTh Library subscriptions: ProQuest Central

Searching Remote Databases, Please Wait