"Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs"

Datacenters use GPUs to provide the significant computing throughput required by emerging user-facing services. The diurnal user access pattern of user-facing services provides a strong incentive to co-located applications for better GPU utilization, and prior work has focused on enabling co-location on multicore processors and traditional non-preemptive GPUs. However, current GPUs are evolving towards spatial multitasking and introduce a new set of challenges to eliminate QoS violations. To address this open problem, we explore the underlying causes of QoS violation on spatial multitasking GPUs. In response to these causes, we propose C-Laius, a runtime system that carefully allocates the computation resource to co-located applications for maximizing the throughput of batch applications while guaranteeing the required QoS of user-facing services. C-Laius not only allows co-locating one user-facing application with multiple batch applications, but also supports the co-location of multiple user-facing applications with batch applications. In the case of a single co-located user-facing application, our evaluation on an Nvidia RTX 2080Ti GPU shows that C-Laius improves the utilization of spatial multitasking GPUs by 20.8 percent, while achieving the 99%-ile latency target for user-facing services. As to the case of multiple co-located user-facing applications, C-Laius ensures no violation of QoS while improving the accelerator utilization by 35.9 percent on average.

Keywords: user facing; gpus; multitasking gpus; facing services; spatial multitasking

Journal Title: IEEE Transactions on Computers
Year Published: 2022

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended