LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Custom Multicache Architectures for Heap Manipulating Programs

Photo by tomterifx from unsplash

Memory-intensive implementations often require access to an external, off-chip memory which can substantially slow down an field-programmable gate array accelerator due to memory bandwidth limitations. Buffering frequently reused data on… Click to show full abstract

Memory-intensive implementations often require access to an external, off-chip memory which can substantially slow down an field-programmable gate array accelerator due to memory bandwidth limitations. Buffering frequently reused data on chip is a common approach to address this problem and the optimization of the cache architecture introduces yet another complex design space. This paper presents a high-level synthesis (HLS) design aid that automatically generates parallel multicache systems which are tailored to the specific requirements of the application. Our program analysis identifies nonoverlapping memory regions, supported by private caches, and regions which are shared by parallel units after parallelization, which are supported by coherent caches and synchronization primitives. It also decides whether the parallelization is legal with respect to data dependencies. The novelty of this paper is the focus on programs using dynamically allocated, pointer-based data structures which, while common in software engineering, remain difficult to analyze and are beyond the scope of the overwhelming majority of HLS techniques to date. Second, we devise a high-level cache performance estimation to find a heterogeneous configuration of cache sizes that maximizes the performance of the multicache system subject to an on-chip memory resource constraint. We demonstrate our technique with three case studies of applications using dynamic data structures and use Xilinx Vivado HLS as an exemplary HLS tool. We show up to $\boldsymbol {15}\boldsymbol {\times }$ speed-up after parallelization of the HLS implementations and the insertion of the application-specific distributed hybrid multicache architecture.

Keywords: hls; multicache; custom multicache; multicache architectures; memory; architectures heap

Journal Title: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year Published: 2017

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.