With the rapid development of edge data intelligence, task offloading (TO) and resource allocation (RA) optimization in multiaccess edge computing networks can significantly improve the Quality of Service (QoS). However,… Click to show full abstract
With the rapid development of edge data intelligence, task offloading (TO) and resource allocation (RA) optimization in multiaccess edge computing networks can significantly improve the Quality of Service (QoS). However, for the online scenario, traditional methods (e.g., game theory and numerical methods) cannot adapt to dynamic environments. Deep reinforcement learning (DRL) is applied to adjust the policy to get long-term rewards. Nevertheless, since the joint problem of TO and RA is nonconvex and NP-hard, existing DRL methods cannot guarantee high efficiency because of the large action space. To solve the above problem, we propose a fast learning with integrated reward and reduced action space-based DRL framework (FLIRRAS), which adopts a low-complexity approach to jointly optimize TO and RA strategies. The FLIRRAS framework combines DRL with numerical methods to iteratively pursues the discrete TO and continuous RA. Specifically, a deep neural network (DNN) is used to learn environmental information, which can get prior knowledge of the offloading decision. Furthermore, a novel reward integrating the utility of TO and RA is designed to motivate the agent to find the optimal policy. To solve the dilemma that the action space is too large, low-complexity convex optimization methods, i.e., subgradient projection and KKT condition, are used to supplement and adjust the decision, which reduces the network parameters and the decision space. In addition, given the dynamic online environment, we introduce the experience replay mechanism, where policy is updated regularly to reflect the best mapping between states. The experiment results show that the performance of FLIRRAS is better than greedy and other DRL approaches, and it outperforms the latest DRL method by over 18.0% in terms of execution time.
               
Click one of the above tabs to view related content.