GPUs have been becoming an indispensable computing platform in data centers, and co-locating multiple applications on the same GPU is widely used to improve resource utilization. However, performance interference due… Click to show full abstract
GPUs have been becoming an indispensable computing platform in data centers, and co-locating multiple applications on the same GPU is widely used to improve resource utilization. However, performance interference due to uncontrolled resource contention severely degrades the performance of co-locating applications and fails to deliver satisfactory user experience. In this paper, we present SMGuard, a software approach to flexibly manage the GPU resource usage of multiple applications under co-location. We also propose a capacity based GPU resource model CapSM, which provisions the GPU resources in a fine-grained granularity among co-locating applications. When co-locating latency-sensitive applications with batch applications, SMGuard can prevent batch applications from occupying resources without constraint using quota based mechanism, and guarantee the resource usage of latency-sensitive applications with reservation based mechanism. In addition, SMGuard supports dynamic resource adjustment through evicting the running thread blocks of batch applications to release the occupied resources and remapping the uncompleted thread blocks to the remaining resources, which avoids the relaunch of the preempted kernel. The SMGuard is a pure software solution that does not rely on special GPU hardware or programming model, which is easy to adopt on commodity GPUs in data centers. Our evaluation shows that SMGuard improves the average performance of latency-sensitive applications by 9.8× when co-located with batch applications. In the meanwhile, the GPU utilization can be improved by 35 percent on average.
               
Click one of the above tabs to view related content.