Many workloads are written in garbage-collected languages and GC consumes a significant fraction of resources for these workloads. We propose to decrease this overhead by moving GC into a small… Click to show full abstract
Many workloads are written in garbage-collected languages and GC consumes a significant fraction of resources for these workloads. We propose to decrease this overhead by moving GC into a small hardware accelerator that is located close to the memory controller and performs GC more efficiently than a CPU. We first show a general design of such a GC accelerator and describe how it can be integrated into both stop-the-world and pause-free garbage collectors. We then demonstrate an end-to-end RTL prototype, integrated into a RocketChip RISC-V System-on-Chip (SoC) executing full Java benchmarks within JikesRVM running under Linux on FPGAs. Our prototype performs the mark phase of a tracing GC at 4.2× the performance of an in-order CPU, at just 18.5% the area. By prototyping our design in a real system, we show that our accelerator can be adopted without invasive changes to the SoC, and estimate its performance, area, and energy.
               
Click one of the above tabs to view related content.