Language:

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

Data mining and knowledge discovery, 2016-07, Vol.30 (4), p.891-927 [Peer Reviewed Journal]

The Author(s) 2016 ;ISSN: 1384-5810 ;EISSN: 1573-756X ;DOI: 10.1007/s10618-015-0444-8

Full text available

Citations Cited by

Actions
1. Add to My Research
2. Remove from My Research
3. E-mail
4. Print
5. Permalink
6. Citation
7. EasyBib
8. EndNote
9. RefWorks
10. Delicious
11. Export RIS
12. Export BibTeX

Title:
On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
Author: Campos, Guilherme O. ; Zimek, Arthur ; Sander, Jörg ; Campello, Ricardo J. G. B. ; Micenková, Barbora ; Schubert, Erich ; Assent, Ira ; Houle, Michael E.
Subjects: Algorithms ; Artificial Intelligence ; Benchmarking ; Chemistry and Earth Sciences ; Computer Science ; Constants ; Data analysis ; Data mining ; Data Mining and Knowledge Discovery ; Datasets ; Information Storage and Retrieval ; Outliers (statistics) ; Physics ; Semantics ; Statistics for Engineering ; Tasks
Is Part Of: Data mining and knowledge discovery, 2016-07, Vol.30 (4), p.891-927
Description: The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. In this paper, we perform an extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of different methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results.
Publisher: New York: Springer US
Language: English
Identifier: ISSN: 1384-5810
EISSN: 1573-756X
DOI: 10.1007/s10618-015-0444-8
Source: AUTh Library subscriptions: ProQuest Central

Back to results list


INSPIRE LIBRARY - TON DUC THANG UNIVERSITY	(84-028) 37 755 057	Feedback
19 Nguyen Huu Tho St. Dist.7, HCM	thuvien@tdtu.edu.vn	Feedback

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

The Author(s) 2016 ;ISSN: 1384-5810 ;EISSN: 1573-756X ;DOI: 10.1007/s10618-015-0444-8

Searching Remote Databases, Please Wait