skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Dataset for Siswati: Parallel textual data for English and Siswati and monolingual textual data for Siswati

Data in brief, 2024-06, Vol.54, p.110325-110325, Article 110325 [Peer Reviewed Journal]

2024 The Author(s) ;2024 The Author(s). ;ISSN: 2352-3409 ;EISSN: 2352-3409 ;DOI: 10.1016/j.dib.2024.110325 ;PMID: 38617020

Full text available

Citations Cited by
  • Title:
    Dataset for Siswati: Parallel textual data for English and Siswati and monolingual textual data for Siswati
  • Author: Gaustad, Tanja ; McKellar, Cindy A. ; Puttkammer, Martin J.
  • Subjects: Human Language Technology ; Language corpora ; Machine translation ; Natural Language Processing ; South African languages ; Under-resourced languages
  • Is Part Of: Data in brief, 2024-06, Vol.54, p.110325-110325, Article 110325
  • Description: This data article presents a dataset for Siswati, a Bantu language of the Nguni group that is one of the eleven official South African languages and the official language of Eswatini (together with English). The dataset contains parallel textual data between English and Siswati as well as monolingual data for Siswati and was developed for use as training data for machine translation systems, specifically the Autshumato machine translation project. Both corpora can also be used for development and evaluation of Natural Language Processing (NLP) core technologies for Siswati. In addition, the data lends itself for corpus linguistic studies. The article describes how the data was collected, what type of texts it contains and what clean-up was done. It also provides an overview of the number of words contained in the datasets.
  • Publisher: Netherlands: Elsevier Inc
  • Language: English
  • Identifier: ISSN: 2352-3409
    EISSN: 2352-3409
    DOI: 10.1016/j.dib.2024.110325
    PMID: 38617020
  • Source: PubMed Central
    ROAD: Directory of Open Access Scholarly Resources
    DOAJ Directory of Open Access Journals

Searching Remote Databases, Please Wait