LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Experimental Findings on the Sources of Detected Unrecoverable Errors in GPUs

Photo from wikipedia

We investigate the sources of detected unrecoverable errors (DUEs) in graphics processing units (GPUs) exposed to a neutron beam. Illegal memory accesses and interface errors are among the more likely… Click to show full abstract

We investigate the sources of detected unrecoverable errors (DUEs) in graphics processing units (GPUs) exposed to a neutron beam. Illegal memory accesses and interface errors are among the more likely sources of DUEs. Error-correcting code (ECC) increases the launch failure events. Our test procedure has shown that ECC can reduce the DUEs caused by Illegal Address access up to 92% for Kepler and up to 98% for Volta. In addition, we analyze whether the compiler optimizations can impact the DUE sources distribution for the matrix multiplication. We found that the machine codes generated by the different optimization levels can change the DUE source by no more than 24% on average.

Keywords: sources detected; detected unrecoverable; unrecoverable errors; errors gpus; findings sources; experimental findings

Journal Title: IEEE Transactions on Nuclear Science
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.