LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

PPOAccel: A High-Throughput Acceleration Framework for Proximal Policy Optimization

Photo by martindorsch from unsplash

Reinforcement Learning (RL) is a major branch of AI that enables agents to learn optimal decision making via interaction with the environment. Proximal Policy Optimization (PPO) is the state-of-the-art policy… Click to show full abstract

Reinforcement Learning (RL) is a major branch of AI that enables agents to learn optimal decision making via interaction with the environment. Proximal Policy Optimization (PPO) is the state-of-the-art policy optimization based RL algorithm which achieves superior overall performance on various benchmarks. A PPO agent iteratively optimizes its policy - a function which chooses optimal actions approximated by a DNN, with each iteration consisting of two computationally intensive phases: Sample Generation - where agents inference on its policy and interact with the environment to collect data, and Model Update - where the policy is trained using the collected data. In this paper, we develop the first high-throughput PPO accelerator on CPU-FPGA heterogeneous platform. Our unified systolic-array based design accelerates both the inference and the training of the deep neural network used in a RL algorithm, and is generalizable to various MLP and CNN models across a wide range of RL applications. We develop novel optimizations to simultaneously reduce data access and computation latencies, specifically: (a) optimal data flow mapping to systolic array, (b) novel memory-blocked data layout to enable streaming stall-free data access in both forward and backward propagations, and, (c) a systolic array compute sharing technique to mitigate load imbalance in the training of two networks. We evaluate our design on widely used robotics and gaming benchmarks, achieving 1.4×–26× and 1.3×–2.7× improvements in throughput, respectively, when compared with state-of-the-art CPU/CPU-GPU implementations.

Keywords: policy optimization; high throughput; proximal policy; policy

Journal Title: IEEE Transactions on Parallel and Distributed Systems
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.