skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Classification of incunable glyphs and out-of-distribution detection with joint energy-based models

International journal on document analysis and recognition, 2023-09, Vol.26 (3), p.223-240 [Peer Reviewed Journal]

The Author(s) 2023 ;ISSN: 1433-2833 ;EISSN: 1433-2825 ;DOI: 10.1007/s10032-023-00442-x

Digital Resources/Online E-Resources

Citations Cited by
  • Title:
    Classification of incunable glyphs and out-of-distribution detection with joint energy-based models
  • Author: Kordon, Florian ; Weichselbaumer, Nikolaus ; Herz, Randall ; Mossman, Stephen ; Potten, Edward ; Seuret, Mathias ; Mayr, Martin ; Christlein, Vincent
  • Subjects: Computer Science ; Image Processing and Computer Vision ; Pattern Recognition ; Special Issue Paper
  • Is Part Of: International journal on document analysis and recognition, 2023-09, Vol.26 (3), p.223-240
  • Description: Optical character recognition (OCR) has proved a powerful tool for the digital analysis of printed historical documents. However, its ability to localize and identify individual glyphs is challenged by the tremendous variety in historical type design, the physicality of the printing process, and the state of conservation. We propose to mitigate these problems by a downstream fine-tuning step that corrects for pathological and undesirable extraction results. We implement this idea by using a joint energy-based model which classifies individual glyphs and simultaneously prunes potential out-of-distribution (OOD) samples like rubrications, initials, or ligatures. During model training, we introduce specific margins in the energy spectrum that aid this separation and explore the glyph distribution’s typical set to stabilize the optimization procedure. We observe strong classification at 0.972 AUPRC across 42 lower- and uppercase glyph types on a challenging digital reproduction of Johannes Balbus’ Catholicon , matching the performance of purely discriminative methods. At the same time, we achieve OOD detection rates of 0.989 AUPRC and 0.946 AUPRC for OOD ‘clutter’ and ‘ligatures’ which substantially improves upon recently proposed OOD detection techniques. The proposed approach can be easily integrated into the postprocessing phase of current OCR to aid reproduction and shape analysis research.
  • Publisher: Berlin/Heidelberg: Springer Berlin Heidelberg
  • Language: English
  • Identifier: ISSN: 1433-2833
    EISSN: 1433-2825
    DOI: 10.1007/s10032-023-00442-x
  • Source: Springer Nature OA/Free Journals

Searching Remote Databases, Please Wait