This work studies the cooperative inference of deep neural networks (DNNs), in which a memory-constrained end device performs a delay-constrained inference process with an aid of an edge server. Although… Click to show full abstract
This work studies the cooperative inference of deep neural networks (DNNs), in which a memory-constrained end device performs a delay-constrained inference process with an aid of an edge server. Although several works considered the cooperative inference of DNNs in the literature, it was assumed in those works that the memory footprints at end devices are unlimited, which is in practice not realistic. To address this issue, in this work, a memory-aware cooperative DNN inference is proposed. Specifically, we propose to adopt knowledge distillation to obtain high-performing lightweight DNNs. To minimize the inference delay, we first analyze the end-to-end delay required for processing the proposed cooperative DNN inference, and then we minimize the delay by jointly optimizing the DNN partitioning point and the intermediate data transmission rate. Also, a dynamic DNN selection scheme is developed by fully exploiting the available memory resource in order to maximize the performance of the inference task in terms of inference accuracy. Experimental results demonstrate that the proposed cooperative DNN inference considerably outperforms the comparable schemes while satisfying both the delay constraint and the memory constraint.
               
Click one of the above tabs to view related content.