skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

PlasForest: a homology-based random forest classifier for plasmid detection in genomic datasets

BMC bioinformatics, 2021-06, Vol.22 (1), p.1-349, Article 349 [Peer Reviewed Journal]

COPYRIGHT 2021 BioMed Central Ltd. ;2021. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;Distributed under a Creative Commons Attribution 4.0 International License ;The Author(s) 2021 ;ISSN: 1471-2105 ;EISSN: 1471-2105 ;DOI: 10.1186/s12859-021-04270-w ;PMID: 34174810

Full text available

Citations Cited by
  • Title:
    PlasForest: a homology-based random forest classifier for plasmid detection in genomic datasets
  • Author: Pradier, Léa ; Tissot, Tazzio ; Fiston-Lavier, Anna-Sophie ; Bedhomme, Stéphanie
  • Subjects: Antibiotic resistance ; Antibiotics ; Bacteria ; Bioinformatics ; Chromosomes ; Classifiers ; Data mining ; Datasets ; Expression vectors ; Genes ; Genomes ; Genomic datasets ; Genomics ; Homology ; Horizontal transfer ; Identification and classification ; Life Sciences ; Machine learning ; Metagenomics ; Methodology ; Methods ; Pipelines ; Plasmid identification ; Plasmids ; Quantitative Methods ; Random forest classifier
  • Is Part Of: BMC bioinformatics, 2021-06, Vol.22 (1), p.1-349, Article 349
  • Description: Abstract Background Plasmids are mobile genetic elements that often carry accessory genes, and are vectors for horizontal transfer between bacterial genomes. Plasmid detection in large genomic datasets is crucial to analyze their spread and quantify their role in bacteria adaptation and particularly in antibiotic resistance propagation. Bioinformatics methods have been developed to detect plasmids. However, they suffer from low sensitivity (i.e . , most plasmids remain undetected) or low precision (i.e., these methods identify chromosomes as plasmids), and are overall not adapted to identify plasmids in whole genomes that are not fully assembled (contigs and scaffolds). Results We developed PlasForest, a homology-based random forest classifier identifying bacterial plasmid sequences in partially assembled genomes. Without knowing the taxonomical origin of the samples, PlasForest identifies contigs as plasmids or chromosomes with a F1 score of 0.950. Notably, it can detect 77.4% of plasmid contigs below 1 kb with 2.8% of false positives and 99.9% of plasmid contigs over 50 kb with 2.2% of false positives. Conclusions PlasForest outperforms other currently available tools on genomic datasets by being both sensitive and precise. The performance of PlasForest on metagenomic assemblies are currently well below those of other k-mer-based methods, and we discuss how homology-based approaches could improve plasmid detection in such datasets.
  • Publisher: London: BioMed Central Ltd
  • Language: English
  • Identifier: ISSN: 1471-2105
    EISSN: 1471-2105
    DOI: 10.1186/s12859-021-04270-w
    PMID: 34174810
  • Source: Hyper Article en Ligne (HAL) (Open Access)
    GFMER Free Medical Journals
    PubMed Central
    ROAD: Directory of Open Access Scholarly Resources
    ProQuest Central
    DOAJ Directory of Open Access Journals
    Springer Nature OA Free Journals

Searching Remote Databases, Please Wait