skip to main content
Guest
My Research
My Account
Sign out
Sign in
This feature requires javascript
Library Search
Find Databases
Browse Search
E-Journals A-Z
E-Books A-Z
Citation Linker
Help
Language:
English
Vietnamese
This feature required javascript
This feature requires javascript
Primo Search
All Library Resources
All
Course Materials
Course Materials
Search For:
Clear Search Box
Search in:
All Library Resources
Or hit Enter to replace search target
Or select another collection:
Search in:
All Library Resources
Search in:
Print Resources
Search in:
Digital Resources
Search in:
Online E-Resources
Advanced Search
Browse Search
This feature requires javascript
Search Limited to:
Search Limited to:
Resource type
criteria input
All items
Books
Articles
Images
Audio Visual
Maps
Graduate theses
Show Results with:
criteria input
that contain my query words
with my exact phrase
starts with
Show Results with:
Search type Index
criteria input
anywhere in the record
in the title
as author/creator
in subject
Full Text
ISBN
ISSN
TOC
Keyword
Field
Show Results with:
in the title
Show Results with:
anywhere in the record
in the title
as author/creator
in subject
Full Text
ISBN
ISSN
TOC
Keyword
Field
This feature requires javascript
KMSAV: Korean multi‐speaker spontaneous audiovisual dataset
ETRI Journal, 2024, 46(1), , pp.71-81
[Peer Reviewed Journal]
1225‐6463/$ © 2024 ETRI ;ISSN: 1225-6463 ;EISSN: 2233-7326 ;DOI: 10.4218/etrij.2023-0352
Full text available
Citations
Cited by
View Online
Details
Recommendations
Reviews
Times Cited
External Links
This feature requires javascript
Actions
Add to My Research
Remove from My Research
E-mail
Print
Permalink
Citation
EasyBib
EndNote
RefWorks
Delicious
Export RIS
Export BibTeX
This feature requires javascript
Title:
KMSAV: Korean multi‐speaker spontaneous audiovisual dataset
Author:
Park, Kiyoung
;
Oh, Changhan
;
Dong, Sunghee
Subjects:
audiovisual data
;
dataset
;
multi-speaker spontaneous data
;
multimodal data
;
speech recognition
;
전자/정보통신공학
Is Part Of:
ETRI Journal, 2024, 46(1), , pp.71-81
Description:
Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open‐source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state‐of‐the‐art ASR and AVSR techniques, capitalizing on both pretrained models and fine‐tuning processes. After fine‐tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.
Publisher:
Electronics and Telecommunications Research Institute (ETRI)
Language:
English;Korean
Identifier:
ISSN: 1225-6463
EISSN: 2233-7326
DOI: 10.4218/etrij.2023-0352
Source:
DOAJ Directory of Open Access Journals
This feature requires javascript
This feature requires javascript
Back to results list
This feature requires javascript
This feature requires javascript
Searching Remote Databases, Please Wait
Searching for
in
scope:(TDTS),scope:(SFX),scope:(TDT),scope:(SEN),primo_central_multiple_fe
Show me what you have so far
This feature requires javascript
This feature requires javascript