LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

MRCIM: A Many-Core Reconfigurable Computing-in-Memory Processor Combining CPU and Tensor Modes for NN Acceleration

Many-core architecture is a promising architecture to accelerate increasingly larger neural networks (NNs). Most many-core architectures couple a standalone CPU core and a tensor core together as a compute node.… Click to show full abstract

Many-core architecture is a promising architecture to accelerate increasingly larger neural networks (NNs). Most many-core architectures couple a standalone CPU core and a tensor core together as a compute node. However, the existing architectures suffer from inefficiency at the architecture, data flow, and control flow levels: The standalone scalar CPU core with deep out-of-order pipeline and low data parallelism per instruction incurs high hardware overhead and low throughput; Fixed proportions of CPU and tensor cores execute computations alternately in each cluster, leading to core under-utilization under diverse workloads; The MIMD parallelism strategy causes redundant instruction cache (I-Cache) accesses, which increases power consumption. To tackle the above limitations, we propose MRCIM, a many-core reconfigurable computing-in-memory (CIM) processor with reconfigurable cores featuring both CPU and tensor modes. 1) We design a reconfigurable CPU core by reusing the CIM-based tensor core’s inherent memory and computing logic to simplify the pipeline logic and improve the data parallelism of conventional CPU. 2) We propose interleaved workload execution (IWE) and adaptive workload mapping (AWM) scheduling strategies, which dynamically adjust the proportion of CPU core and tensor core in a cluster, making them work in parallel with high utilization. 3) We propose a hybrid MIMD/SIMD control flow to bypass unnecessary I-Cache accesses by instruction forwarding and sharing, thereby reducing power consumption. Experimental results show MRCIM achieves 166.48x~446.67x speedup and 96.76x~309.01x energy saving over Intel i9-13900k CPU, 12.62x~27.62x speedup and 5.49x~17.82x energy saving over NVIDIA RTX 4090 GPU. Compared with state-of-the-art NN processor architectures, our MRCIM achieves average 6.84x, 7.51x, and 3.66x speedup and average 4.57x, 3.03x, and 3.11x energy saving over Simba, LUT-ICC, and MAICC.

Keywords: core; memory; many core; cpu; cpu tensor

Journal Title: IEEE Transactions on Circuits and Systems I: Regular Papers
Year Published: 2025

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.