Efficient edge inference by selective query

Saligrama, Venkatesh

Efficient edge inference by selective query

Files

3068_efficient_edge_inference_by_se.pdf(3.66 MB)

Published version

Date

2023-04-25

Authors

Saligrama, Venkatesh

Version

Published version

URI

https://hdl.handle.net/2144/48708

Citation

V. Saligrama. 2023. "Efficient edge inference by selective query" ICLR 2023. https://openreview.net/group?id=ICLR.cc/2023/Conference

Abstract

Edge devices provide inference on predictive tasks to many end-users. However, deploying deep neural networks that achieve state-of-the-art accuracy on these devices is infeasible due to edge resource constraints. Nevertheless, cloud-only processing, the de-facto standard, is also problematic, since uploading large amounts of data imposes severe communication bottlenecks. We propose a novel end-to-end hybrid learning framework that allows the edge to selectively query only those hard examples that the cloud can classify correctly. Our framework optimizes over neural architectures and trains edge predictors and routing models so that the overall accuracy remains high while minimizing the overall latency. Training a hybrid learner is difficult since we lack annotations of hard edge-examples. We introduce a novel proxy supervision in this context and show that our method adapts seamlessly and near optimally across different latency regimes. On the ImageNet dataset, our proposed method deployed on a micro-controller unit exhibits 25% reduction in latency compared to cloud-only processing while suffering no excess loss.

Collections

BU Open Access Articles

Full item page