skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria

Bioinformatics (Oxford, England), 2017-10, Vol.33 (20), p.3202-3210 [Peer Reviewed Journal]

The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com ;The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 2017 ;Wageningen University & Research ;ISSN: 1367-4803 ;EISSN: 1367-4811 ;DOI: 10.1093/bioinformatics/btx400 ;PMID: 28633438

Full text available

Citations Cited by
  • Title:
    SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria
  • Author: Chevrette, Marc G ; Aicheler, Fabian ; Kohlbacher, Oliver ; Currie, Cameron R ; Medema, Marnix H
  • Birol, Inanc
  • Subjects: Actinobacteria - enzymology ; Actinobacteria - genetics ; Actinobacteria - metabolism ; Algorithms ; Bioinformatica ; Bioinformatics ; Catalytic Domain ; Computational Biology - methods ; EPS ; Multigene Family ; Original Papers ; Peptide Synthases - metabolism ; Peptides - metabolism ; Sequence Analysis, Protein - methods ; Software ; Substrate Specificity
  • Is Part Of: Bioinformatics (Oxford, England), 2017-10, Vol.33 (20), p.3202-3210
  • Description: Nonribosomally synthesized peptides (NRPs) are natural products with widespread applications in medicine and biotechnology. Many algorithms have been developed to predict the substrate specificities of nonribosomal peptide synthetase adenylation (A) domains from DNA sequences, which enables prioritization and dereplication, and integration with other data types in discovery efforts. However, insufficient training data and a lack of clarity regarding prediction quality have impeded optimal use. Here, we introduce prediCAT, a new phylogenetics-inspired algorithm, which quantitatively estimates the degree of predictability of each A-domain. We then systematically benchmarked all algorithms on a newly gathered, independent test set of 434 A-domain sequences, showing that active-site-motif-based algorithms outperform whole-domain-based methods. Subsequently, we developed SANDPUMA, a powerful ensemble algorithm, based on newly trained versions of all high-performing algorithms, which significantly outperforms individual methods. Finally, we deployed SANDPUMA in a systematic investigation of 7635 Actinobacteria genomes, suggesting that NRP chemical diversity is much higher than previously estimated. SANDPUMA has been integrated into the widely used antiSMASH biosynthetic gene cluster analysis pipeline and is also available as an open-source, standalone tool. SANDPUMA is freely available at https://bitbucket.org/chevrm/sandpuma and as a docker image at https://hub.docker.com/r/chevrm/sandpuma/ under the GNU Public License 3 (GPL3). chevrette@wisc.edu or marnix.medema@wur.nl. Supplementary data are available at Bioinformatics online.
  • Publisher: England: Oxford University Press
  • Language: English
  • Identifier: ISSN: 1367-4803
    EISSN: 1367-4811
    DOI: 10.1093/bioinformatics/btx400
    PMID: 28633438
  • Source: Journals@Ovid Open Access Journal Collection Rolling
    PubMed Central (Open access)
    Geneva Foundation Free Medical Journals at publisher websites
    MEDLINE

Searching Remote Databases, Please Wait