LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

FLIRRAS: Fast Learning With Integrated Reward and Reduced Action Space for Online Multitask Offloading

Photo from wikipedia

With the rapid development of edge data intelligence, task offloading (TO) and resource allocation (RA) optimization in multiaccess edge computing networks can significantly improve the Quality of Service (QoS). However,… Click to show full abstract

With the rapid development of edge data intelligence, task offloading (TO) and resource allocation (RA) optimization in multiaccess edge computing networks can significantly improve the Quality of Service (QoS). However, for the online scenario, traditional methods (e.g., game theory and numerical methods) cannot adapt to dynamic environments. Deep reinforcement learning (DRL) is applied to adjust the policy to get long-term rewards. Nevertheless, since the joint problem of TO and RA is nonconvex and NP-hard, existing DRL methods cannot guarantee high efficiency because of the large action space. To solve the above problem, we propose a fast learning with integrated reward and reduced action space-based DRL framework (FLIRRAS), which adopts a low-complexity approach to jointly optimize TO and RA strategies. The FLIRRAS framework combines DRL with numerical methods to iteratively pursues the discrete TO and continuous RA. Specifically, a deep neural network (DNN) is used to learn environmental information, which can get prior knowledge of the offloading decision. Furthermore, a novel reward integrating the utility of TO and RA is designed to motivate the agent to find the optimal policy. To solve the dilemma that the action space is too large, low-complexity convex optimization methods, i.e., subgradient projection and KKT condition, are used to supplement and adjust the decision, which reduces the network parameters and the decision space. In addition, given the dynamic online environment, we introduce the experience replay mechanism, where policy is updated regularly to reflect the best mapping between states. The experiment results show that the performance of FLIRRAS is better than greedy and other DRL approaches, and it outperforms the latest DRL method by over 18.0% in terms of execution time.

Keywords: fast learning; action space; learning integrated; space; reward

Journal Title: IEEE Internet of Things Journal
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.