LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Co-Concurrency Mechanism for Multi-GPUs in Distributed Heterogeneous Environments

Photo by pluyar from unsplash

The high concurrency and high throughput characteristics of graphics processing units (GPUs) have made researchers continue to use it to optimize distributed parallel computing architectures. With the upgrading of processor… Click to show full abstract

The high concurrency and high throughput characteristics of graphics processing units (GPUs) have made researchers continue to use it to optimize distributed parallel computing architectures. With the upgrading of processor architecture, GPUs allow multiple kernels to execute concurrently through stream queues. However, due to the different hardware characteristics and kernel properties in distributed architectures, existing research lacks careful consideration of optimization schemes for concurrent streams and kernel block sizes. Unreasonable stream concurrency and kernel block size configuration will lead to prolonged execution time and waste of computing resources during application execution. Therefore, we propose a multi-GPU multi-stream co-concurrency mechanism (MGSC) in a distributed heterogeneous environment, dynamically adjusting the number of concurrent streams and exploring the optimal block size in task scheduling. According to the memory resources and startup overhead occupied in concurrent stream scheduling, a resource-aware concurrent stream adaptive adjustment mechanism is proposed, which can dynamically adjust the number of streams. To explore the optimal block size, we abstract it as a multi-armed bandit problem (MAB) and propose a block size adjustment algorithm based on the upper confidence bound (UCB). We implement MGSC in Spark 3.1.1 and NVIDIA CUDA 11.2. We conduct comparative experiments with multiple typical benchmarks to evaluate the performance of MGSC. The experimental results show that the algorithm can make full use of the computing power of the GPU and significantly reduce the execution time of tasks.

Keywords: distributed heterogeneous; concurrency; concurrency mechanism; block; block size

Journal Title: IEEE Transactions on Parallel and Distributed Systems
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.