This article presents generative adversarial network processing unit (GANPU), an energy-efficient multiple deep neural network (DNN) training processor for GANs. It enables on-device training of GANs on performance- and battery-limited… Click to show full abstract
This article presents generative adversarial network processing unit (GANPU), an energy-efficient multiple deep neural network (DNN) training processor for GANs. It enables on-device training of GANs on performance- and battery-limited mobile devices, without sending user-specific data to servers, fully evading privacy concerns. Training GANs require a massive amount of computation, and therefore, it is difficult to accelerate in a resource-constrained platform. Besides, networks and layers in GANs show dramatically changing operational characteristics, making it difficult to optimize the processor’s core and bandwidth allocation. For higher throughput and energy efficiency, this article proposed three key features. An adaptive spatiotemporal workload multiplexing is proposed to maintain high utilization in accelerating multiple DNNs in a single GAN model. To take advantage of ReLU sparsity during both inference and training, dual-sparsity exploitation architecture is proposed to skip redundant computations due to input and output feature zeros. Moreover, an exponent-only ReLU speculation (EORS) algorithm is proposed along with its lightweight processing element (PE) architecture, to estimate the location of output feature zeros during the inference with minimal hardware overhead. Fabricated in a 65-nm process, the GANPU achieved the energy efficiency of 75.68 TFLOPS/W for 16-bit floating-point computation, which is 4.85 $\times $ higher than the state of the art. As a result, GANPU enables on-device training of GANs with high energy efficiency.
               
Click one of the above tabs to view related content.