skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Enabling Deep Document Image Analysis with Generative Models

ISBN: 9789180483049 ;ISBN: 9180483046 ;ISBN: 9789180483032 ;ISBN: 9180483038

Digital Resources/Online E-Resources

Citations Cited by
  • Title:
    Enabling Deep Document Image Analysis with Generative Models
  • Author: Nikolaidou, Konstantina
  • Subjects: Machine Learning ; Maskininlärning
  • Description: Historical documents are a valuable source of cultural knowledge and can provide information about previous events, societies, beliefs, and cultures. They can serve as an excellent source for research in various fields including history, literature, linguistics, and anthropology. Their preservation and analysis pose significant challenges due to the unique characteristics of handwritten scripts, the variability, and the document degradation. With the rise of the Deep Learning era, enormous amounts of annotated data are required to train large models that can efficiently perform tasks on unseen data. Nowadays, digital libraries provide high-quality digitized images for analysis and processing of historical documents. However, collecting and annotating the provided data is an expensive task and requires a lot of expertise from historians and the humanities. Hence, generating synthetic data to enhance the performance of Deep Learning frameworks is a common approach in Computer Vision and, specifically in this thesis, in Document Image Analysis and Recognition (DIAR). This thesis focuses on leveraging generative models to facilitate DIAR tasks, focusing on historical and handwritten documents, by generating realistic synthetic images that resemble a real distribution and enhance the training of downstream DIAR tasks. The contributions of the thesis include a systematic literature review, a comparison evaluation, and a developed method for handwriting generation. First, a systematic literature review of existing historical document image datasets, provides summarized information of 65 studies, focusing on different aspects, such as statistics, document type, language, visual, and annotation aspects. The study discusses limitations and promising resources for future research, which refer to the limited dataset size and absence of benchmarks, as well as the lack of standardization in terms of data format and evaluation scheme. A subsequent contribution is the integration of generated data in a historical document font classification task. Semi-synthetic data are generated with the use of DocCreator, an open-source software, from which different document degradation augmentations are used. A conditional Generative Adversarial Network (GAN) is used to generate fully synthetic data conditioned on a specific sample. The data generated by the two methods areintegrated as additional samples in the training of several Convolutional Neural Networks classifiers and the effect in the performance is examined. The final contribution of the thesis introduces a new method for generating styled handwritten text images based on Denoising Diffusion Probabilistic Models (DDPM), which is an unexplored method in DIAR. The method manages to capture stylistic and content characteristics of a standard multi-writer handwriting dataset and achieved an improved performance in enhancing writer identification and handwriting text recognition compared to Generative Adversarial Network (GAN)-based methods. The results demonstrate the potential of the generative method for enabling deep document image analysis and pave the way for further research. As a future direction, this work will aim to progress from generating word images to generating sentence and full document images by conditioning on the content, style, and layout of historical documents. Another future action will be to further extend the proposed method to operate in a few-shot scheme for the writer style condition in order to generate unseen styles. Furthermore, the future work will aim to leverage important features from pre-training with synthetic and real data in order to generalize to historical documents that are a scarce source and adjusting the text encoding parts to different languages and scripts. Finally, the ultimate goal of the future work aims to generate a massive synthetic historical document image database to fill the existing benchmark gap.
  • Creation Date: 2023
  • Language: English
  • Identifier: ISBN: 9789180483049
    ISBN: 9180483046
    ISBN: 9789180483032
    ISBN: 9180483038
  • Source: SWEPUB Freely available online

Searching Remote Databases, Please Wait