skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

UFO2: A unified pre-training framework for online and offline speech recognition

arXiv.org, 2023-04

2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;http://arxiv.org/licenses/nonexclusive-distrib/1.0 ;EISSN: 2331-8422 ;DOI: 10.48550/arxiv.2210.14515

Full text available

Citations Cited by
  • Title:
    UFO2: A unified pre-training framework for online and offline speech recognition
  • Author: Fu, Li ; Li, Siqi ; Li, Qingtao ; Deng, Liping ; Li, Fangzhu ; Lu, Fan ; Chen, Meng ; He, Xiaodong
  • Subjects: Automatic speech recognition ; Computer Science - Sound ; Training
  • Is Part Of: arXiv.org, 2023-04
  • Description: In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2) Automatic Speech Recognition (ASR), which 1) simplifies the two separate training workflows for online and offline modes into one process, and 2) improves the Word Error Rate (WER) performance with limited utterance annotating. Specifically, we extend the conventional offline-mode Self-Supervised Learning (SSL)-based ASR approach to a unified manner, where the model training is conditioned on both the full-context and dynamic-chunked inputs. To enhance the pre-trained representation model, stop-gradient operation is applied to decouple the online-mode objectives to the quantizer. Moreover, in both the pre-training and the downstream fine-tuning stages, joint losses are proposed to train the unified model with full-weight sharing for the two modes. Experimental results on the LibriSpeech dataset show that UFO2 outperforms the SSL-based baseline method by 29.7% and 18.2% relative WER reduction in offline and online modes, respectively.
  • Publisher: Ithaca: Cornell University Library, arXiv.org
  • Language: English
  • Identifier: EISSN: 2331-8422
    DOI: 10.48550/arxiv.2210.14515
  • Source: arXiv.org
    Open Access: Freely Accessible Journals by multiple vendors
    ROAD: Directory of Open Access Scholarly Resources
    ProQuest Central

Searching Remote Databases, Please Wait