skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Designing an information system for the electronic document management of a university: automatic classification of documents

Journal of physics. Conference series, 2022-03, Vol.2182 (1), p.12035 [Peer Reviewed Journal]

Published under licence by IOP Publishing Ltd ;Published under licence by IOP Publishing Ltd. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;ISSN: 1742-6588 ;EISSN: 1742-6596 ;DOI: 10.1088/1742-6596/2182/1/012035

Full text available

Citations Cited by
  • Title:
    Designing an information system for the electronic document management of a university: automatic classification of documents
  • Author: Tkachenko, A L ; Denisova, L A
  • Subjects: Algorithms ; Automatic classification ; Classification ; Colleges & universities ; Document management ; Document management systems ; Electronic documents ; Feature extraction ; Machine learning
  • Is Part Of: Journal of physics. Conference series, 2022-03, Vol.2182 (1), p.12035
  • Description: Abstract To ensure the effective functioning of the university educational environment, document flow processes automation, which includes the task of documents automatic classification, is of great importance. The article considers the task of classifying university documents by machine learning methods in order to improve the quality of classification. Documents preprocessing was carried out, which made it possible to distinguish significant words in documents, due to which the accuracy of documents classification increased. Described are methods of extracting features from text TF and TF-IDF, which determine keywords by words frequency included in document. A modification of the TF-IDF method is proposed, which consists in calculating the words importance depending on their part of speech. This made it possible to improve the classification quality by highlighting only important and significant words in documents. Suggested is a classification algorithm using a method of support vectors to reduce the documents number involved in classification and a method of k-nearest neighbor for classification. The advantage of this algorithm over the described analogues is shown, which is expressed in the number of mistakenly classified documents decrease.
  • Publisher: Bristol: IOP Publishing
  • Language: English
  • Identifier: ISSN: 1742-6588
    EISSN: 1742-6596
    DOI: 10.1088/1742-6596/2182/1/012035
  • Source: Geneva Foundation Free Medical Journals at publisher websites
    IOPscience (Open Access)
    Institute of Physics Open Access Journal Titles
    ProQuest Central

Searching Remote Databases, Please Wait