Temperature rising is an unavoidable effect on VLSI and has always been a critical issue in any system-on-chip – especially when targeting compute-intensive applications. This effect increases the delay in… Click to show full abstract
Temperature rising is an unavoidable effect on VLSI and has always been a critical issue in any system-on-chip – especially when targeting compute-intensive applications. This effect increases the delay in hardware accelerators, resulting in timing errors due to unsustainable clock frequency, whose impact must be carefully evaluated on design time to measure the performance degradation of the hardware accelerator. Further, a hardware operating at a higher temperature accelerates device aging, which incurs in more timing errors. This issue is usually addressed with the inclusion of timing guardbands that compensate for the deleterious effects of temperature, ensuring the hardware accelerator works within a reliable zone, i.e., without any timing errors caused by temperature effects at runtime. However, guardbands directly result in considerable performance and efficiency losses because the circuit will be clocked at a frequency lower than its full potential. Accelerators on edge devices often dismiss such guardbands to explore the full potential of the designed circuits, posing an enormous design challenge as this approach requires a careful evaluation of the impact of timing errors on the quality of the target applications. Many algorithms, such as in multimedia and machine learning applications, are capable of tolerating hardware errors. Yet, these algorithms have a dynamic behavior (i.e., closed-loop) where a timing error can be propagated, affecting subsequent steps. Measuring the degradation-induced errors in these applications is very challenging given that an accurate gate-level simulation to investigate degradation-induced timing errors needs to be coupled dynamically with a system-level simulator to unveil how induced errors in the underlying hardware ultimately impact the algorithm execution in the hardware accelerator. This is the first work to achieve this goal. State-of-the-art works have studied accelerators under timing-errors when removing (or narrowing) guardbands. However, their approach was suitable only for open-loop hardware accelerators which are entirely agnostic of complex interactions of the algorithms. Unlike prior work, this paper investigates temperature- and aging-induced timing-errors in the joint accelerator-algorithm interactions and their runtime impacts. Our framework investigates aging effects across the different layers starting from transistor physics all the way up to the algorithm layer. The hardware accelerator employed as a case study in this work is the sum of absolute differences (SAD), which is the most compute-intensive accelerator on commercial video encoder for mobile applications. Our results demonstrate the runtime behavior impacts of three advanced block-matching algorithms of the video encoder in a joint operation by a SAD accelerator under timing-errors induced by temperature and aging effects considering a 14nm FinFET technology.
               
Click one of the above tabs to view related content.