"CASPHAr: Cache-Managed Accelerator Staging and Pipelining in Heterogeneous System Architectures"

Photo by thisisengineering from unsplash

Integrating accelerators onto the same chip with CPUs sharing the last level cache (LLC) is beneficial when CPUs and accelerators frequently exchange data. However, if shared data exceeds LLC capacity, expensive spills to and refetches from DRAM will be incurred, limiting the benefits of such integrated architectures. While this can be avoided through careful software optimizations, such as fine-grain data tiling and accelerator synchronization, this involves significant software changes and programmer effort. In this article, we introduce CASPHAr, an LLC architecture that performs automatic, software- and hardware-transparent fine-grain data staging and synchronization between CPUs and accelerators in hardware. CASPHAr tracks and synchronizes producer and consumer accesses at cache line granularity. As soon as some fraction of shared data is produced and becomes ready in the LLC, the data will be delivered for processing in the waiting consumer. Furthermore, CASPHAr extends existing replacement policies to leverage synchronization information for making better eviction decisions. All combined, CASPHAr reduces data spills due to unnecessarily long lifetimes of shared data in the cache. In addition, fine-grain staging and synchronization inherently achieves system-level pipelining of interdependent kernels that can outperform software optimizations. Results show that CASPHAr can boost performance by up to 23% and achieve energy savings of up to 22% over baseline accelerations.

Keywords: software; system; accelerator; fine grain; cache; shared data

Journal Title: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year Published: 2022

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended