"An Eight-Core RISC-V Processor With Compute Near Last Level Cache in Intel 4 CMOS"

An eight-core 64-b processor extends RISC-V to perform multiply–accumulate (MAC) within the shared last level cache (LLC). Instead of moving data from the LLC to the core, compute near last level cache (CNC) adds MAC to the LLC datapath and performs computation near where the data are stored. The RV64GC CNC instruction set architecture (ISA) extension performs digital MAC near unmodified SRAM arrays and has a low area overhead of 1.4%. CNC increases memory access width to 512 b per instruction by avoiding bottlenecks in the on- chip networks. The operation also reduces data movement by keeping MAC results and most input operands local to the LLC slices. CNC supports computation on cached data from main memory, coherent data sharing between cores, and virtual addressing. The CNC instructions are included in C++ programs and run either baremetal or in Linux. The 1.15-GHz chip reduces energy consumption by 52 $\times $ for fully connected and 29 $\times $ for convolutional deep neural network (DNN) layers, compared to scalar operation. Two benchmarks are characterized: MLPerf Tiny Anomaly Detection v0.5 latency is reduced by 4.25 $\times $ to 40 $\mu \text{s}$ versus previous work, and inference latency on memory-augmented neural networks is improved by 4.1 $\times $ versus scalar operation.

Keywords: level cache; inline formula; last level; tex math

Journal Title: IEEE Journal of Solid-State Circuits
Year Published: 2023

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
2

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended