skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Types of minority class examples and their influence on learning classifiers from imbalanced data

Journal of intelligent information systems, 2016-06, Vol.46 (3), p.563-597 [Peer Reviewed Journal]

The Author(s) 2015 ;Springer Science+Business Media New York 2016 ;ISSN: 0925-9902 ;EISSN: 1573-7675 ;DOI: 10.1007/s10844-015-0368-1

Full text available

Citations Cited by
  • Title:
    Types of minority class examples and their influence on learning classifiers from imbalanced data
  • Author: Napierala, Krystyna ; Stefanowski, Jerzy
  • Subjects: Algorithms ; Analysis ; Artificial Intelligence ; Classification ; Classifiers ; Computer Science ; Computer simulation ; Data Structures and Information Theory ; Datasets ; Identification ; Information Storage and Retrieval ; Information systems ; Intelligent systems ; IT in Business ; Learning ; Methods ; Minorities ; Natural Language Processing (NLP) ; Neighborhoods ; Studies ; Visualization
  • Is Part Of: Journal of intelligent information systems, 2016-06, Vol.46 (3), p.563-597
  • Description: Many real-world applications reveal difficulties in learning classifiers from imbalanced data. Although several methods for improving classifiers have been introduced, the identification of conditions for the efficient use of the particular method is still an open research problem. It is also worth to study the nature of imbalanced data, characteristics of the minority class distribution and their influence on classification performance. However, current studies on imbalanced data difficulty factors have been mainly done with artificial datasets and their conclusions are not easily applicable to the real-world problems, also because the methods for their identification are not sufficiently developed. In our paper, we capture difficulties of class distribution in real datasets by considering four types of minority class examples: safe, borderline, rare and outliers. First, we confirm their occurrence in real data by exploring multidimensional visualizations of selected datasets. Then, we introduce a method for an identification of these types of examples, which is based on analyzing a class distribution in a local neighbourhood of the considered example. Two ways of modeling this neighbourhood are presented: with k-nearest examples and with kernel functions. Experiments with artificial datasets show that these methods are able to re-discover simulated types of examples. Next contributions of this paper include carrying out a comprehensive experimental study with 26 real world imbalanced datasets, where (1) we identify new data characteristics basing on the analysis of types of minority examples; (2) we demonstrate that considering the results of this analysis allow to differentiate classification performance of popular classifiers and pre-processing methods and to evaluate their areas of competence. Finally, we highlight directions of exploiting the results of our analysis for developing new algorithms for learning classifiers and pre-processing methods.
  • Publisher: New York: Springer US
  • Language: English
  • Identifier: ISSN: 0925-9902
    EISSN: 1573-7675
    DOI: 10.1007/s10844-015-0368-1
  • Source: SpringerOpen
    AUTh Library subscriptions: ProQuest Central

Searching Remote Databases, Please Wait