LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Demand MemCpy: Overlapping of Computation and Data Transfer for Heterogeneous Computing

Photo by campaign_creators from unsplash

Heterogeneous computing relies on collaboration among different types of processors on shared data. In systems with discrete accelerators (e.g., GP-GPU), data sharing requires transferring a large amount of data between… Click to show full abstract

Heterogeneous computing relies on collaboration among different types of processors on shared data. In systems with discrete accelerators (e.g., GP-GPU), data sharing requires transferring a large amount of data between CPU and accelerator memories and can significantly increase the end-to-end execution time. This paper proposes a novel mechanism called Demand MemCpy (DMC) to hide the data sharing overheads. DMC copies data from host memory to accelerator memory based on demands at page granularity. It utilizes a hardware-only mechanism to fetch the requested page with a short latency and the background pre-copy to fetch related pages in advance. Our evaluation shows that DMC can reduce the end-to-end execution time of GP-GPU application by 25.4% on average by overlapping computation with data transfer and not transferring unused pages.

Keywords: overlapping computation; data transfer; heterogeneous computing; demand memcpy; computation data

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.