This article present a highly efficient and performance-enhanced Softmax Function (SF) designed for a deep neural network accelerator. The SF is an essential component of deep learning models, primarily used… Click to show full abstract
This article present a highly efficient and performance-enhanced Softmax Function (SF) designed for a deep neural network accelerator. The SF is an essential component of deep learning models, primarily used in the classification layer, and also in hidden layers of advanced neural networks like Transformer and Capsule networks. The primary challenge of designing an efficient hardware architecture for SF is the complex exponential and division computational sub-blocks. To address this challenge, a hardware-optimized pipelined CORDIC-based architecture is proposed, leveraging the mutual exclusivity of the CO-ordinate Rotational DIgital Computer (CORDIC) algorithm, designed for enhanced throughput, area, and power. To maintain good accuracy in deep learning models, the proposed SF design undergoes a Pareto study that evaluates the variation of accuracy concerning the number of pipeline stages. The proposed design is quantized to 16-bit precision, and inference accuracy is validated for different datasets. The SF is prototyped using Xilinx Zynq FPGA, operating at 685MHz, and ASIC implementation is performed for 45nm technology node at 5GHz of maximum operating frequency. The design achieves a validation accuracy loss of less than 2% while reducing silicon area and Energy-Delay-Product (EDP) by $12\times $ . Post-synthesis simulation results indicate that the proposed design outperforms state-of-the-art architectures, achieving $3\times $ better performance in terms of area, power, and logic delay.
               
Click one of the above tabs to view related content.