Efficient edge inference by selective query
Files
Published version
Date
2023-04-25
DOI
Authors
Saligrama, Venkatesh
Version
Published version
OA Version
Citation
V. Saligrama. 2023. "Efficient edge inference by selective query" ICLR 2023. https://openreview.net/group?id=ICLR.cc/2023/Conference
Abstract
Edge devices provide inference on predictive tasks to many end-users. However,
deploying deep neural networks that achieve state-of-the-art accuracy on these
devices is infeasible due to edge resource constraints. Nevertheless, cloud-only processing,
the de-facto standard, is also problematic, since uploading large amounts
of data imposes severe communication bottlenecks. We propose a novel end-to-end
hybrid learning framework that allows the edge to selectively query only those
hard examples that the cloud can classify correctly. Our framework optimizes over
neural architectures and trains edge predictors and routing models so that the overall
accuracy remains high while minimizing the overall latency. Training a hybrid
learner is difficult since we lack annotations of hard edge-examples. We introduce a
novel proxy supervision in this context and show that our method adapts seamlessly
and near optimally across different latency regimes. On the ImageNet dataset, our
proposed method deployed on a micro-controller unit exhibits 25% reduction in
latency compared to cloud-only processing while suffering no excess loss.