Language:

KMSAV: Korean multi‐speaker spontaneous audiovisual dataset

ETRI Journal, 2024, 46(1), , pp.71-81 [Peer Reviewed Journal]

1225‐6463/$ © 2024 ETRI ;ISSN: 1225-6463 ;EISSN: 2233-7326 ;DOI: 10.4218/etrij.2023-0352

Full text available

Citations Cited by

Actions
1. Add to My Research
2. Remove from My Research
3. E-mail
4. Print
5. Permalink
6. Citation
7. EasyBib
8. EndNote
9. RefWorks
10. Delicious
11. Export RIS
12. Export BibTeX

Title:
KMSAV: Korean multi‐speaker spontaneous audiovisual dataset
Author: Park, Kiyoung ; Oh, Changhan ; Dong, Sunghee
Subjects: audiovisual data ; dataset ; multi-speaker spontaneous data ; multimodal data ; speech recognition ; 전자/정보통신공학
Is Part Of: ETRI Journal, 2024, 46(1), , pp.71-81
Description: Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open‐source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state‐of‐the‐art ASR and AVSR techniques, capitalizing on both pretrained models and fine‐tuning processes. After fine‐tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.
Publisher: Electronics and Telecommunications Research Institute (ETRI)
Language: English;Korean
Identifier: ISSN: 1225-6463
EISSN: 2233-7326
DOI: 10.4218/etrij.2023-0352
Source: DOAJ Directory of Open Access Journals

Back to results list


INSPIRE LIBRARY - TON DUC THANG UNIVERSITY	(84-028) 37 755 057	Feedback
19 Nguyen Huu Tho St. Dist.7, HCM	thuvien@tdtu.edu.vn	Feedback

KMSAV: Korean multi‐speaker spontaneous audiovisual dataset

1225‐6463/$ © 2024 ETRI ;ISSN: 1225-6463 ;EISSN: 2233-7326 ;DOI: 10.4218/etrij.2023-0352

Searching Remote Databases, Please Wait