Language:

Urdu language processing: a survey

The Artificial intelligence review, 2017-03, Vol.47 (3), p.279-311 [Peer Reviewed Journal]

Springer Science+Business Media Dordrecht 2016 ;COPYRIGHT 2017 Springer ;Artificial Intelligence Review is a copyright of Springer, 2017. ;ISSN: 0269-2821 ;EISSN: 1573-7462 ;DOI: 10.1007/s10462-016-9482-x

Full text available

Citations Cited by

Actions
1. Add to My Research
2. Remove from My Research
3. E-mail
4. Print
5. Permalink
6. Citation
7. EasyBib
8. EndNote
9. RefWorks
10. Delicious
11. Export RIS
12. Export BibTeX

Title:
Urdu language processing: a survey
Author: Daud, Ali ; Khan, Wahab ; Che, Dunren
Subjects: Artificial Intelligence ; Asian ; Boundaries ; Computational linguistics ; Computer Science ; Datasets ; Information retrieval ; Information storage and retrieval ; Language ; Language processing ; Languages ; Linguistics ; Morphology ; Multilingualism ; Natural language interfaces ; Natural language processing ; Orthography ; Plagiarism ; Resource sharing ; Speech recognition ; State of the art ; State-of-the-art reviews ; Stemming ; Surveys ; Tasks ; Urdu language ; Writing
Is Part Of: The Artificial intelligence review, 2017-03, Vol.47 (3), p.279-311
Description: Extensive work has been done on different activities of natural language processing for Western languages as compared to its Eastern counterparts particularly South Asian Languages. Western languages are termed as resource-rich languages. Core linguistic resources e.g. corpora, WordNet, dictionaries, gazetteers and associated tools being developed for Western languages are customarily available. Most South Asian Languages are low resource languages e.g. Urdu is a South Asian Language, which is among the widely spoken languages of sub-continent. Due to resources scarcity not enough work has been conducted for Urdu. The core objective of this paper is to present a survey regarding different linguistic resources that exist for Urdu language processing, to highlight different tasks in Urdu language processing and to discuss different state of the art available techniques. Conclusively, this paper attempts to describe in detail the recent increase in interest and progress made in Urdu language processing research. Initially, the available datasets for Urdu language are discussed. Characteristic, resource sharing between Hindi and Urdu, orthography, and morphology of Urdu language are provided. The aspects of the pre-processing activities such as stop words removal, Diacritics removal, Normalization and Stemming are illustrated. A review of state of the art research for the tasks such as Tokenization, Sentence Boundary Detection, Part of Speech tagging, Named Entity Recognition, Parsing and development of WordNet tasks are discussed. In addition, impact of ULP on application areas, such as, Information Retrieval, Classification and plagiarism detection is investigated. Finally, open issues and future directions for this new and dynamic area of research are provided. The goal of this paper is to organize the ULP work in a way that it can provide a platform for ULP research activities in future.
Publisher: Dordrecht: Springer Netherlands
Language: English
Identifier: ISSN: 0269-2821
EISSN: 1573-7462
DOI: 10.1007/s10462-016-9482-x
Source: ProQuest One Psychology
ProQuest Central

Back to results list


INSPIRE LIBRARY - TON DUC THANG UNIVERSITY	(84-028) 37 755 057	Feedback
19 Nguyen Huu Tho St. Dist.7, HCM	thuvien@tdtu.edu.vn	Feedback

Urdu language processing: a survey

Springer Science+Business Media Dordrecht 2016 ;COPYRIGHT 2017 Springer ;Artificial Intelligence Review is a copyright of Springer, 2017. ;ISSN: 0269-2821 ;EISSN: 1573-7462 ;DOI: 10.1007/s10462-016-9482-x

Searching Remote Databases, Please Wait