skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Strategies for building wordnets for under-resourced languages : the case of African languages

Literator, 2017, Vol.38 (1), p.1-12 [Peer Reviewed Journal]

COPYRIGHT 2017 African Online Scientific Information Systems (Pty) Ltd t/a AOSIS ;COPYRIGHT 2017 African Online Scientific Information Systems (Pty) Ltd t/a AOSIS ;Copyright AOSIS (Pty) Ltd 2017 ;This work is licensed under a Creative Commons Attribution 4.0 International License. ;ISSN: 0258-2279 ;ISSN: 2219-8237 ;EISSN: 2219-8237 ;DOI: 10.4102/lit.v38i1.1351

Full text available

Citations Cited by
  • Title:
    Strategies for building wordnets for under-resourced languages : the case of African languages
  • Author: Bosch, Sonja E. ; Griesel, Marissa
  • Subjects: African languages ; African wordnet ; Analysis ; Bilingual dictionaries ; Collaboration ; Computational linguistics ; Encyclopedias and dictionaries ; English language ; Language & Linguistics ; Language processing ; Linguistics ; Literary devices ; Literary translation ; Literature ; Multilingualism ; Natural language interfaces ; Natural language processing ; Ontology ; Semantic relations ; semi-automatic extraction ; Sotho languages ; Translation ; under-resourced languages ; Xhosa language ; Zulu language
  • Is Part Of: Literator, 2017, Vol.38 (1), p.1-12
  • Description: The African Wordnet Project (AWN) aims at building wordnets for five African languages: Setswana, isiXhosa, isiZulu, Sesotho sa Leboa (also referred to as Sepedi or Northern Sotho) and Tshivenda. Currently, the so-called expand model, based on the structure of the English Princeton WordNet (PWN), is used to continually develop the African Wordnets manually. This is a labour-intensive work that needs to be performed by linguistic experts, guided by several considerations such as the level of lexicalisation of a term in the African language. Up to now, linguists were responsible for identifying and translating appropriate synsets without much help from electronic resources because in the case of African languages even basic resources such as computer readable and electronic bilingual wordlists are usually not freely available. Methods to speed up the manual development of synsets and ease the workload of the human language experts were recently investigated. These centred around utilising the minimal amount of information available in bilingual dictionaries to identify synsets in the PWN that should be included in the AWN, transferring information from dictionaries to the wordnet and presenting the potential synsets to linguists for final approval and inclusion in the wordnets. In this article, we describe the methodology developed for building the African Wordnets, a potentially significant resource for natural language processing applications. Available resources that could be taken advantage of and resources that had to be developed are investigated, and initial results and future plans are explained.Strategieë om woordnette vir hulpbronskaars tale te ontwikkel: ‘n gevallestudie vir Afrikatale.Die African Wordnet Projek (AWN) het ten doel om woordnette vir vyf Afrikatale te ontwikkel. Die tale sluit Setswana, isiXhosa, isiZulu, Sesotho sa Leboa (ook Sepedi of Noord-Sotho genoem) en Tshivenda in. Die sogenaamde uitbreidingsmodel, wat op die struktuur van die Engelse Princeton WordNet (PWN) gebaseer is, word tans gebruik om die AWN deurlopend handmatig uit te brei. Hierdie metode is baie arbeidsintensief en moet deur linguiste uitgevoer word. Die linguiste word deur verskeie kriteria, soos die vlak van leksikalisering van ‘n woord en die geskiktheid van die sinstel vir die taal, gelei. Linguiste moes tot nou toe hierdie besluite sonder veel ondersteuning in die vorm van elektroniese hulpmiddels maak, aangesien daar vir baie Afrikatale nog nie eers basiese hulpbronne soos vrylik beskikbare, rekenaarleesbare en elektroniese tweetalige woordelyste bestaan nie. Metodes om die handmatige ontwikkeling van sinstelle te bespoedig en die werkslading op die taalspesialiste te verlig, het onlangs baie aandag geniet. Die eksperimente het daaroor gegaan dat die minimale hoeveelheid bronne wat wel beskikbaar is, ingespan word om sinstelle in die PWN te identifiseer wat na die AWN oorgedra behoort te word. Inligting uit die tweetalige woordelyste word op sinvolle wyse onttrek en aan die linguiste voorgehou om die finale seleksie te maak. In hierdie artikel word die metodologie wat gebruik is om die AWN te ontwikkel, voorgelê. Beskikbare hulpbronne wat in die verskillende eksperimente gebruik of ontwikkel is, word beskryf, voorlopige resultate word gegee en toekomstige planne word beskryf.
  • Publisher: Potchefstroom: AOSIS
  • Language: English;Portuguese;Afrikaans
  • Identifier: ISSN: 0258-2279
    ISSN: 2219-8237
    EISSN: 2219-8237
    DOI: 10.4102/lit.v38i1.1351
  • Source: SciELO
    ProQuest Central
    DOAJ Directory of Open Access Journals

Searching Remote Databases, Please Wait