Server workloads frequently encounter L1-D cache misses, and hence, lose significant performance potential. One way to reduce the number of L1-D misses or their effect is data prefetching. As L1-D… Click to show full abstract
Server workloads frequently encounter L1-D cache misses, and hence, lose significant performance potential. One way to reduce the number of L1-D misses or their effect is data prefetching. As L1-D access sequences have high temporal correlations, temporal prefetching techniques are promising for L1 caches. State-of-the-art temporal prefetching techniques are effective at reducing the number of L1-D misses, but we observe that there is a significant gap between what they offer and the opportunity. This work aims to improve the effectiveness of temporal prefetching techniques. To overcome the deficiencies of existing temporal prefetchers, we introduce Domino prefetching. Domino prefetcher is a temporal prefetching technique that looks up the history to find the last occurrence of the last one or two L1-D miss addresses for prefetching. We show that Domino prefetcher captures more than 87 percent of the temporal opportunity at L1-D. Through evaluation of a 16-core processor on a set of server workloads, we show that Domino prefetcher improves system performance by 26 percent (up to 56 percent).
               
Click one of the above tabs to view related content.