This paper proposes an effective hardware accelerator for 2D $8\times 8$ discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) using an improved Loeffler architecture. The accelerator optimizes the… Click to show full abstract
This paper proposes an effective hardware accelerator for 2D $8\times 8$ discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) using an improved Loeffler architecture. The accelerator optimizes the data stream of the Loeffler 8-point 1D DCT/IDCT according to the characteristics of image and video processing. An 8-stage pipeline structure greatly improves the processing speed by reasonably dividing the number of clock cycles and simplifying the arithmetic operations in each cycle. The multiplication-free approximation of the DCT coefficients is implemented through adders and shifters, combined with both fixed-point and canonic signed digit (CSD) coding. In particular, the proposed fast parallel transposed matrix architecture achieves the function of row-column coefficient conversion with lower circuit complexity. The FPGA implementation of the proposed architecture uses a Virtex-7 XC7VX330T device, running at 288 MHz with a throughput of 558 M Pixel/sec, and a Full HD real-time frame rate of up to 269 fps. Only 33 cycles are required to complete the $8\times 8$ blocks of 2D DCT/IDCT, which can be used as a high-performance hardware accelerator for image and video compression encoding.
Share on Social Media:
  
        
        
        
Sign Up to like & get recommendations! 3
Related content
More Information
            
News
            
Social Media
            
Video
            
Recommended
               
Click one of the above tabs to view related content.