skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

PROBABILITY DISTRIBUTION OVER THE SET OF CLASSES IN ARABIC DIALECT CLASSIFICATION TASK

Nauchno-tekhnicheskiĭ vestnik informat͡s︡ionnykh tekhnologiĭ, mekhaniki i optiki, 2017-01, Vol.17 (1), p.110 [Peer Reviewed Journal]

Copyright St. Petersburg National Research University of Information Technologies, Mechanics and Optics Jan/Feb 2017 ;ISSN: 2226-1494 ;EISSN: 2500-0373 ;DOI: 10.17586/2226-1494-2017-17-1-110-116

Full text available

Citations Cited by
  • Title:
    PROBABILITY DISTRIBUTION OVER THE SET OF CLASSES IN ARABIC DIALECT CLASSIFICATION TASK
  • Author: Durandin, O V ; Hilal, N R ; Strebkov, D Y ; Zolotykh, N Y
  • Subjects: Accounting ; Algorithms ; Annotations ; Classification ; Classifiers ; Dialects ; Machine learning ; Natural language processing ; Probability distribution ; Social networks ; Training
  • Is Part Of: Nauchno-tekhnicheskiĭ vestnik informat͡s︡ionnykh tekhnologiĭ, mekhaniki i optiki, 2017-01, Vol.17 (1), p.110
  • Description: Subject of Research.We propose an approach for solving machine learning classification problem that uses the information about the probability distribution on the training data class label set. The algorithm is illustrated on a complex natural language processing task - classification of Arabic dialects. Method. Each object in the training set is associated with a probability distribution over the class label set instead of a particular class label. The proposed approach solves the classification problem taking into account the probability distribution over the class label set to improve the quality of the built classifier. Main Results. The suggested approach is illustrated on the automatic Arabic dialects classification example. Mined from the Twitter social network, the analyzed data contain word-marks and belong to the following six Arabic dialects: Saudi, Levantine, Algerian, Egyptian, Iraq, Jordan, and to the modern standard Arabic (MSA). The paper results demonstrate an increase of the quality of the built classifier achieved by taking into account probability distributions over the set of classes. Experiments carried out show that even relatively naive accounting of the probability distributions improves the precision of the classifier from 44% to 67%. Practical Relevance. Our approach and corresponding algorithm could be effectively used in situations when a manual annotation process performed by experts is connected with significant financial and time resources, but it is possible to create a system of heuristic rules. The implementation of the proposed algorithm enables to decrease significantly the data preparation expenses without substantial losses in the precision of the classification.
  • Publisher: Saint Petersburg: St. Petersburg National Research University of Information Technologies, Mechanics and Optics
  • Language: Russian
  • Identifier: ISSN: 2226-1494
    EISSN: 2500-0373
    DOI: 10.17586/2226-1494-2017-17-1-110-116
  • Source: ProQuest Central
    DOAJ Directory of Open Access Journals

Searching Remote Databases, Please Wait