Language:

Prevalence of neural collapse during the terminal phase of deep learning training

Proceedings of the National Academy of Sciences - PNAS, 2020-10, Vol.117 (40), p.24652-24663 [Peer Reviewed Journal]

Copyright National Academy of Sciences Oct 6, 2020 ;Copyright © 2020 the Author(s). Published by PNAS. 2020 ;ISSN: 0027-8424 ;EISSN: 1091-6490 ;DOI: 10.1073/pnas.2015509117 ;PMID: 32958680

Full text available

Citations Cited by

Actions
1. Add to My Research
2. Remove from My Research
3. E-mail
4. Print
5. Permalink
6. Citation
7. EasyBib
8. EndNote
9. RefWorks
10. Delicious
11. Export RIS
12. Export BibTeX

Title:
Prevalence of neural collapse during the terminal phase of deep learning training
Author: Papyan, Vardan ; Han, X. Y. ; Donoho, David L.
Subjects: Apexes ; Classification ; Classifiers ; Collapse ; Deep learning ; Physical Sciences ; Rescaling ; Scaling ; Training
Is Part Of: Proceedings of the National Academy of Sciences - PNAS, 2020-10, Vol.117 (40), p.24652-24663
Description: Modern practice for training classification deepnets involves a terminal phase of training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero, while training loss is pushed toward zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call neural collapse (NC), involving four deeply interconnected phenomena. (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class means. (NC2) The class means collapse to the vertices of a simplex equiangular tight frame (ETF). (NC3) Up to rescaling, the last-layer classifiers collapse to the class means or in other words, to the simplex ETF (i.e., to a self-dual configuration). (NC4) For a given activation, the classifier’s decision collapses to simply choosing whichever class has the closest train class mean (i.e., the nearest class center [NCC] decision rule). The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability.
Publisher: Washington: National Academy of Sciences
Language: English
Identifier: ISSN: 0027-8424
EISSN: 1091-6490
DOI: 10.1073/pnas.2015509117
PMID: 32958680
Source: Geneva Foundation Free Medical Journals at publisher websites
PubMed Central

Back to results list


INSPIRE LIBRARY - TON DUC THANG UNIVERSITY	(84-028) 37 755 057	Feedback
19 Nguyen Huu Tho St. Dist.7, HCM	thuvien@tdtu.edu.vn	Feedback

Prevalence of neural collapse during the terminal phase of deep learning training

Copyright National Academy of Sciences Oct 6, 2020 ;Copyright © 2020 the Author(s). Published by PNAS. 2020 ;ISSN: 0027-8424 ;EISSN: 1091-6490 ;DOI: 10.1073/pnas.2015509117 ;PMID: 32958680

Searching Remote Databases, Please Wait