LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

CNN Acceleration With Hardware-Efficient Dataflow for Super-Resolution

Photo by framesforyourheart from unsplash

The convolutional neural network (CNN)-based super-resolution (SR) has shown outstanding performance in the field of computer vision. The implementation of inference hardware for CNN-based SR has suffered from the intensive… Click to show full abstract

The convolutional neural network (CNN)-based super-resolution (SR) has shown outstanding performance in the field of computer vision. The implementation of inference hardware for CNN-based SR has suffered from the intensive computation with severely unbalanced computation load among layers. Various light-weighted SR networks have been researched with little performance degradation. However, the hardware-efficient dataflow is also required to efficiently accelerate inference hardware within limited resources. In this article, we propose the hardware-efficient dataflow of CNN-based SR that reduces computation load by increasing data reuse and increases process element (PE) utilization by balancing the computation load among layers for high throughput. In the proposed dataflow, row-wise pixels in the receptive field are computed by circularly shifting memory addresses to maximize data reuse. The partial convolution is exploited in a layer-based pipeline architecture to relieve intensive computation in a single pipeline stage. The delay-balancing with adjusting parallelism is employed for balancing computations precisely in the overall layers. Furthermore, the inference hardware of CNN-based SR is implemented for 4K ultrahigh definition at 60 fps on a field-programmable gate array (FPGA). For hardware-friendly computation, the quantization of activation and weight is adopted. The proposed hardware shows an average peak signal-to-noise ratio of 36.42 dB in the Set-5 dataset with a memory usage of 53 KB and an average PE utilization of 76.7% in the overall layers. Thus, it achieves the lowest memory usage and highest PE utilization compared with other inference hardware for CNN-based SR.

Keywords: efficient dataflow; hardware efficient; hardware; cnn based; computation; cnn

Journal Title: IEEE Access
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.