Language:

Challenges of Hate Speech Detection in Social Media: Data Scarcity, and Leveraging External Resources

SN computer science, 2021-04, Vol.2 (2), Article 95 [Peer Reviewed Journal]

The Author(s) 2021 ;ISSN: 2662-995X ;ISSN: 2661-8907 ;EISSN: 2661-8907 ;DOI: 10.1007/s42979-021-00457-3

Digital Resources/Online E-Resources

Citations Cited by

Actions
1. Add to My Research
2. Remove from My Research
3. E-mail
4. Print
5. Permalink
6. Citation
7. EasyBib
8. EndNote
9. RefWorks
10. Delicious
11. Export RIS
12. Export BibTeX

Title:
Challenges of Hate Speech Detection in Social Media: Data Scarcity, and Leveraging External Resources
Author: Kovács, György ; Alonso, Pedro ; Saini, Rajkumar
Subjects: BERT ; Computer Imaging ; Computer Science ; Computer Systems Organization and Communication Networks ; Data Structures and Information Theory ; Deep language processing ; Hate speech ; Information Systems and Communication Service ; Machine Learning ; Maskininlärning ; Original Research ; Pattern Recognition and Graphics ; Social Media Analytics and its Evaluation ; Software Engineering/Programming and Operating Systems ; Transfer learning ; Vision ; Vocabulary augmentation
Is Part Of: SN computer science, 2021-04, Vol.2 (2), Article 95
Description: The detection of hate speech in social media is a crucial task. The uncontrolled spread of hate has the potential to gravely damage our society, and severely harm marginalized people or groups. A major arena for spreading hate speech online is social media. This significantly contributes to the difficulty of automatic detection, as social media posts include paralinguistic signals (e.g. emoticons, and hashtags), and their linguistic content contains plenty of poorly written text. Another difficulty is presented by the context-dependent nature of the task, and the lack of consensus on what constitutes as hate speech, which makes the task difficult even for humans. This makes the task of creating large labeled corpora difficult, and resource consuming. The problem posed by ungrammatical text has been largely mitigated by the recent emergence of deep neural network (DNN) architectures that have the capacity to efficiently learn various features. For this reason, we proposed a deep natural language processing (NLP) model—combining convolutional and recurrent layers—for the automatic detection of hate speech in social media data. We have applied our model on the HASOC2019 corpus, and attained a macro F1 score of 0.63 in hate speech detection on the test set of HASOC. The capacity of DNNs for efficient learning, however, also means an increased risk of overfitting. Particularly, with limited training data available (as was the case for HASOC). For this reason, we investigated different methods for expanding resources used. We have explored various opportunities, such as leveraging unlabeled data, similarly labeled corpora, as well as the use of novel models. Our results showed that by doing so, it was possible to significantly increase the classification score attained.
Publisher: Singapore: Springer Singapore
Language: English
Identifier: ISSN: 2662-995X
ISSN: 2661-8907
EISSN: 2661-8907
DOI: 10.1007/s42979-021-00457-3
Source: SWEPUB Freely available online

Back to results list


INSPIRE LIBRARY - TON DUC THANG UNIVERSITY	(84-028) 37 755 057	Feedback
19 Nguyen Huu Tho St. Dist.7, HCM	thuvien@tdtu.edu.vn	Feedback

Challenges of Hate Speech Detection in Social Media: Data Scarcity, and Leveraging External Resources

The Author(s) 2021 ;ISSN: 2662-995X ;ISSN: 2661-8907 ;EISSN: 2661-8907 ;DOI: 10.1007/s42979-021-00457-3

Searching Remote Databases, Please Wait