Scaffolding a student to instill knowledge

Files
Date
2023-04-25
DOI
Authors
Saligrama, Venkatesh
Kag, Anil
Acar, Durmus Alp Emre
Gangrade, Aditya
Version
OA Version
Citation
V. Saligrama, A. Kag, D.A.E. Acar, A. Gangrade. 2023. "Scaffolding a student to instill knowledge" ICLR 2023. https://openreview.net/group?id=ICLR.cc/2023/Conference
Abstract
We propose a novel knowledge distillation (KD) method to selectively instill teacher knowledge into a student model motivated by situations where the student’s capacity is significantly smaller than that of the teachers. In vanilla KD, the teacher primarily sets a predictive target for the student to follow, and we posit that this target is overly optimistic due to the student’s lack of capacity. We develop a novel scaffolding scheme where the teacher, in addition to setting a predictive target, also scaffolds the student’s prediction by censoring hard-to-learn examples. The student model utilizes the same information as the teacher’s soft-max predictions as inputs, and in this sense, our proposal can be viewed as a natural variant of vanilla KD. We show on synthetic examples that censoring hard-examples leads to smoothening the student’s loss landscape so that the student encounters fewer local minima. As a result, it has good generalization properties. Against vanilla KD, we achieve improved performance and are comparable to more intrusive techniques that leverage feature matching on benchmark datasets
Description
License