High accuracy and ever-increasing computing power have made deep neural networks (DNNs) the algorithm of choice for various machine learning, computer vision, and image processing applications across the computing spectrum.… Click to show full abstract
High accuracy and ever-increasing computing power have made deep neural networks (DNNs) the algorithm of choice for various machine learning, computer vision, and image processing applications across the computing spectrum. To this end, Google developed the tensor processing unit (TPU) to accelerate the computationally intensive matrix multiplication operation of a DNN on its systolic array architecture. Faults manifested in the datapath of such a systolic array due to latent manufacturing defects or single-event effects may lead to functional safety (FuSa) violation. Although DNNs are known to resist minor perturbations with their inherent fault-tolerant characteristics, we show that the classification accuracy of the model plummets from 97.4% to 7.75% with a minimal fault rate of 0.0003% in the accelerator, implying catastrophic circumstances when deployed across mission-critical systems. Hence, to ensure FuSa of such accelerators, this article provides an extensive FuSa assessment of the accelerator exposed to faults in the datapath, by varying the network parameters, position, and characteristics of the induced error across multiple exhaustive data sets. Furthermore, we propose two novel strategies to obtain a diminutive set of functional test patterns to detect FuSa violation in a DNN accelerator. Our experimental results demonstrate that the obtained test sets can achieve an average of 92.63% (in some cases, up to 100%) fault coverage with cardinality as low as 0.1% of the entire test data set.
               
Click one of the above tabs to view related content.