An eight-core 64-b processor extends RISC-V to perform multiply–accumulate (MAC) within the shared last level cache (LLC). Instead of moving data from the LLC to the core, compute near last… Click to show full abstract
An eight-core 64-b processor extends RISC-V to perform multiply–accumulate (MAC) within the shared last level cache (LLC). Instead of moving data from the LLC to the core, compute near last level cache (CNC) adds MAC to the LLC datapath and performs computation near where the data are stored. The RV64GC CNC instruction set architecture (ISA) extension performs digital MAC near unmodified SRAM arrays and has a low area overhead of 1.4%. CNC increases memory access width to 512 b per instruction by avoiding bottlenecks in the on- chip networks. The operation also reduces data movement by keeping MAC results and most input operands local to the LLC slices. CNC supports computation on cached data from main memory, coherent data sharing between cores, and virtual addressing. The CNC instructions are included in C++ programs and run either baremetal or in Linux. The 1.15-GHz chip reduces energy consumption by 52
               
Click one of the above tabs to view related content.