skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model

IEEE access, 2021, Vol.9, p.78621-78634 [Peer Reviewed Journal]

Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021 ;ISSN: 2169-3536 ;EISSN: 2169-3536 ;DOI: 10.1109/ACCESS.2021.3083638 ;CODEN: IAECCG

Full text available

Citations Cited by
  • Title:
    Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model
  • Author: Rupapara, Vaibhav ; Rustam, Furqan ; Shahzad, Hina Fatima ; Mehmood, Arif ; Ashraf, Imran ; Choi, Gyu Sang
  • Subjects: Blogs ; BoW ; Classifiers ; data re-sampling ; Datasets ; Deep learning ; Digital media ; ensemble classifier ; Feature extraction ; Logistics ; Machine learning ; Oversampling ; Social networking (online) ; Social networks ; Support vector machines ; synthetic minority oversampling technique ; text classification ; TF-IDF ; Toxic comments classification ; Voting ; Websites
  • Is Part Of: IEEE access, 2021, Vol.9, p.78621-78634
  • Description: Social media platforms and microblogging websites have gained accelerated popularity during the past few years. These platforms are used for expressing views and opinions about products, personalities, and events. Often during discussions and debates, fights take place on social media platforms which involves using rude, disrespectful, and hateful comments called toxic comments. The identification of toxic comments has been regarded as an essential element for social media platforms. This study introduces an ensemble approach, called regression vector voting classifier (RVVC), to identify the toxic comments on social media platforms. The ensemble merges the logistic regression and support vector classifier under soft voting criteria. Several experiments are performed on the imbalanced and balanced dataset to analyze the performance of the proposed approach. For data balance, the synthetic minority oversampling technique (SMOTE) is used on the imbalanced dataset. Furthermore, two feature extraction approaches are utilized to investigate their suitability such as term frequency-inverse document frequency (TF-IDF) and bag-of-words (BoW). The performance of the proposed approach is compared with several machine learning classifiers using accuracy, precision, recall, and F1-score. Results suggest that RVVC outperforms all other individual models when TF-IDF features are used with SMOTE balanced dataset and achieves an accuracy of 0.97.
  • Publisher: Piscataway: IEEE
  • Language: English
  • Identifier: ISSN: 2169-3536
    EISSN: 2169-3536
    DOI: 10.1109/ACCESS.2021.3083638
    CODEN: IAECCG
  • Source: DOAJ Directory of Open Access Journals
    IEEE Open Access Journals

Searching Remote Databases, Please Wait