Language:

Designing an information system for the electronic document management of a university: automatic classification of documents

Journal of physics. Conference series, 2022-03, Vol.2182 (1), p.12035 [Peer Reviewed Journal]

Published under licence by IOP Publishing Ltd ;Published under licence by IOP Publishing Ltd. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;ISSN: 1742-6588 ;EISSN: 1742-6596 ;DOI: 10.1088/1742-6596/2182/1/012035

Full text available

Citations Cited by

Actions
1. Add to My Research
2. Remove from My Research
3. E-mail
4. Print
5. Permalink
6. Citation
7. EasyBib
8. EndNote
9. RefWorks
10. Delicious
11. Export RIS
12. Export BibTeX

Title:
Designing an information system for the electronic document management of a university: automatic classification of documents
Author: Tkachenko, A L ; Denisova, L A
Subjects: Algorithms ; Automatic classification ; Classification ; Colleges & universities ; Document management ; Document management systems ; Electronic documents ; Feature extraction ; Machine learning
Is Part Of: Journal of physics. Conference series, 2022-03, Vol.2182 (1), p.12035
Description: Abstract To ensure the effective functioning of the university educational environment, document flow processes automation, which includes the task of documents automatic classification, is of great importance. The article considers the task of classifying university documents by machine learning methods in order to improve the quality of classification. Documents preprocessing was carried out, which made it possible to distinguish significant words in documents, due to which the accuracy of documents classification increased. Described are methods of extracting features from text TF and TF-IDF, which determine keywords by words frequency included in document. A modification of the TF-IDF method is proposed, which consists in calculating the words importance depending on their part of speech. This made it possible to improve the classification quality by highlighting only important and significant words in documents. Suggested is a classification algorithm using a method of support vectors to reduce the documents number involved in classification and a method of k-nearest neighbor for classification. The advantage of this algorithm over the described analogues is shown, which is expressed in the number of mistakenly classified documents decrease.
Publisher: Bristol: IOP Publishing
Language: English
Identifier: ISSN: 1742-6588
EISSN: 1742-6596
DOI: 10.1088/1742-6596/2182/1/012035
Source: Geneva Foundation Free Medical Journals at publisher websites
IOPscience (Open Access)
Institute of Physics Open Access Journal Titles
ProQuest Central

Back to results list


INSPIRE LIBRARY - TON DUC THANG UNIVERSITY	(84-028) 37 755 057	Feedback
19 Nguyen Huu Tho St. Dist.7, HCM	thuvien@tdtu.edu.vn	Feedback

Designing an information system for the electronic document management of a university: automatic classification of documents

Searching Remote Databases, Please Wait