skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text

Access, IEEE, 2023, Vol.11, p.58406-58421

2013 IEEE ;DOI: 10.1109/ACCESS.2023.3283340

Full text available

Citations Cited by
  • Title:
    An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text
  • Author: Nguyen, Quoc-Dung ; Phan, Nguyet-Minh ; Kromer, Pavel ; Le, Duc-Anh
  • Subjects: Adaptation models ; attention-based encoder-decoder ; character edit ; Computational modeling ; Encoding ; Error correction ; hill climbing ; Linguistics ; OCR ; Optical character recognition ; Optimization ; Training data
  • Is Part Of: Access, IEEE, 2023, Vol.11, p.58406-58421
  • Description: Different types of OCR errors often occur in OCR texts due to the low quality of scanned document images or limitations in OCR software. In this paper, we propose a novel unsupervised approach for OCR error correction. Correction candidates for OCR errors are generated and explored in their neighborhoods using correction character edits controlled by an adapted hill-climbing algorithm. Correction characters are extracted from only original ground truth texts, which do not depend on OCR texts in training data. A weighted objective function used to score and rank correction candidates is heuristically tested to find optimal weight combinations. The proposed model is evaluated on an OCR text dataset originating from the Vietnamese handwritten database in the ICFHR 2018 Vietnamese online handwritten text recognition competition. The proposed model is also verified concerning its stability and complexity. The experimental results show that our model achieves competitive performance compared to the other models in the ICFHR 2018 competition.
  • Publisher: IEEE
  • Language: English
  • Identifier: DOI: 10.1109/ACCESS.2023.3283340
  • Source: IEEE Open Access Journals

Searching Remote Databases, Please Wait