Language:

The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets

PloS one, 2015-03, Vol.10 (3), p.e0118432-e0118432 [Peer Reviewed Journal]

COPYRIGHT 2015 Public Library of Science ;COPYRIGHT 2015 Public Library of Science ;2015 Saito, Rehmsmeier. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;2015 Saito, Rehmsmeier 2015 Saito, Rehmsmeier ;ISSN: 1932-6203 ;EISSN: 1932-6203 ;DOI: 10.1371/journal.pone.0118432 ;PMID: 25738806

Full text available

Citations Cited by

Actions
1. Add to My Research
2. Remove from My Research
3. E-mail
4. Print
5. Permalink
6. Citation
7. EasyBib
8. EndNote
9. RefWorks
10. Delicious
11. Export RIS
12. Export BibTeX

Title:
The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets
Author: Saito, Takaya ; Rehmsmeier, Marc
Brock, Guy
Subjects: Alternatives ; Analysis ; Bioinformatics ; Biology ; Classification ; Classification - methods ; Classifiers ; Computational biology ; Datasets ; Datasets as Topic ; Genomes ; Genomics ; Informatics ; Methods ; MicroRNAs ; Performance evaluation ; Predictions ; Recall ; ROC Curve ; Sensitivity analysis
Is Part Of: PloS one, 2015-03, Vol.10 (3), p.e0118432-e0118432
Description: Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.
Publisher: United States: Public Library of Science
Language: English
Identifier: ISSN: 1932-6203
EISSN: 1932-6203
DOI: 10.1371/journal.pone.0118432
PMID: 25738806
Source: Public Library of Science (PLoS) Journals Open Access
Geneva Foundation Free Medical Journals at publisher websites
MEDLINE
PubMed Central
ProQuest Central
DOAJ Directory of Open Access Journals

Back to results list


INSPIRE LIBRARY - TON DUC THANG UNIVERSITY	(84-028) 37 755 057	Feedback
19 Nguyen Huu Tho St. Dist.7, HCM	thuvien@tdtu.edu.vn	Feedback

The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets

Searching Remote Databases, Please Wait