By orchestrating resources of edge and core network, the delays of edge-assisted computing can decrease. Offloading scheduling is challenging though, especially in the presence of many edge devices with randomly… Click to show full abstract
By orchestrating resources of edge and core network, the delays of edge-assisted computing can decrease. Offloading scheduling is challenging though, especially in the presence of many edge devices with randomly varying link and computing conditions. This paper presents a new online learning-based approach to the offloading scheduling, where multi-agent multi-armed bandit (MA-MAB) learning is designed to exploit the randomly varying conditions and asymptotically minimize the computing delay. We first propose a combinatorial bandit upper confidence bound (CB-UCB) algorithm, where users collectively feed back the observed delays of all edge devices and links. The optimistic bound of the delay is derived to facilitate centralized offloading scheduling for all users. In addition, we put forth a distributed bandit upper confidence bound (DB-UCB) algorithm, where users take random turns to make conflict-free, distributed selections of edge devices. The optimistic confidence bound of each user is developed to allow the user’s selection only based on its own observations and decisions. Furthermore, we establish the asymptotic optimality of the proposed algorithms by proving the sublinearity of their regrets, and that the random turns the users take to make decisions do not compromise the asymptotic optimality of the DB-UCB algorithm, as corroborated by numerical simulations.
               
Click one of the above tabs to view related content.