LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Eff-ECC: Protecting GPGPUs Register File With a Unified Energy-Efficient ECC Mechanism

Photo from wikipedia

Graphics processing units (GPUs) are widely used in general-purpose high-performance computing applications (i.e., GPGPUs), which require reliable execution in the presence of soft errors. To support massive thread-level parallelism, a… Click to show full abstract

Graphics processing units (GPUs) are widely used in general-purpose high-performance computing applications (i.e., GPGPUs), which require reliable execution in the presence of soft errors. To support massive thread-level parallelism, a sizeable register file is adopted in GPUs, which is highly vulnerable to soft errors. Although modern commercial GPUs provide single-error-correction double-error-detection (SEC-DED) error correction code (ECC) for the register file, it consumes a considerable amount of energy due to frequent register accesses and leakage power of ECC storage. In this article, we propose to leverage the error sensitivity of instructions, the duplicate characteristics of the same-named registers, and the error sensitivity of data bits to build a unified energy-efficient ECC mechanism for a GPGPUs register file (Eff-ECC), which consists of instruction-aware ECC (IA-ECC), duplication-aware ECC (DA-ECC), and bit-aware ECC (BA-ECC). Considering the error sensitivity of instructions, IA-ECC merely implements ECCs for the write registers of critical instructions. Observing the same-named registers across threads usually keeps the same data, DA-ECC avoids unnecessary ECC generation and verification for duplicate register values. Leveraging the inherent error-tolerance features of the program, BA-ECC merely protects significant bits of registers to combat the crucial error. Experimental results demonstrate that Eff-ECC tremendously reduces 86.46% energy consumption of traditional SEC-DED ECC.

Keywords: ecc; register; energy; error; register file

Journal Title: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.