skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Urdu language processing: a survey

The Artificial intelligence review, 2017-03, Vol.47 (3), p.279-311 [Peer Reviewed Journal]

Springer Science+Business Media Dordrecht 2016 ;COPYRIGHT 2017 Springer ;Artificial Intelligence Review is a copyright of Springer, 2017. ;ISSN: 0269-2821 ;EISSN: 1573-7462 ;DOI: 10.1007/s10462-016-9482-x

Full text available

Citations Cited by
  • Title:
    Urdu language processing: a survey
  • Author: Daud, Ali ; Khan, Wahab ; Che, Dunren
  • Subjects: Artificial Intelligence ; Asian ; Boundaries ; Computational linguistics ; Computer Science ; Datasets ; Information retrieval ; Information storage and retrieval ; Language ; Language processing ; Languages ; Linguistics ; Morphology ; Multilingualism ; Natural language interfaces ; Natural language processing ; Orthography ; Plagiarism ; Resource sharing ; Speech recognition ; State of the art ; State-of-the-art reviews ; Stemming ; Surveys ; Tasks ; Urdu language ; Writing
  • Is Part Of: The Artificial intelligence review, 2017-03, Vol.47 (3), p.279-311
  • Description: Extensive work has been done on different activities of natural language processing for Western languages as compared to its Eastern counterparts particularly South Asian Languages. Western languages are termed as resource-rich languages. Core linguistic resources e.g. corpora, WordNet, dictionaries, gazetteers and associated tools being developed for Western languages are customarily available. Most South Asian Languages are low resource languages e.g. Urdu is a South Asian Language, which is among the widely spoken languages of sub-continent. Due to resources scarcity not enough work has been conducted for Urdu. The core objective of this paper is to present a survey regarding different linguistic resources that exist for Urdu language processing, to highlight different tasks in Urdu language processing and to discuss different state of the art available techniques. Conclusively, this paper attempts to describe in detail the recent increase in interest and progress made in Urdu language processing research. Initially, the available datasets for Urdu language are discussed. Characteristic, resource sharing between Hindi and Urdu, orthography, and morphology of Urdu language are provided. The aspects of the pre-processing activities such as stop words removal, Diacritics removal, Normalization and Stemming are illustrated. A review of state of the art research for the tasks such as Tokenization, Sentence Boundary Detection, Part of Speech tagging, Named Entity Recognition, Parsing and development of WordNet tasks are discussed. In addition, impact of ULP on application areas, such as, Information Retrieval, Classification and plagiarism detection is investigated. Finally, open issues and future directions for this new and dynamic area of research are provided. The goal of this paper is to organize the ULP work in a way that it can provide a platform for ULP research activities in future.
  • Publisher: Dordrecht: Springer Netherlands
  • Language: English
  • Identifier: ISSN: 0269-2821
    EISSN: 1573-7462
    DOI: 10.1007/s10462-016-9482-x
  • Source: ProQuest One Psychology
    ProQuest Central

Searching Remote Databases, Please Wait