In-memory computing has inspired researchers to consider integrating large-capacity persistent memory (PM) into the main memory subsystem. However, several challenges still remain for providing an integration approach for DRAM-comparable PM… Click to show full abstract
In-memory computing has inspired researchers to consider integrating large-capacity persistent memory (PM) into the main memory subsystem. However, several challenges still remain for providing an integration approach for DRAM-comparable PM on existing enterprise servers. Current commercial servers tend to feature multiple sockets with shared-memory NUMA organizations. Simply constructing a hybrid main memory architecture for these NUMA organizations requires considerable modifications of the system software. Another significant problem in these designs is the high latency of accessing PM on a remote socket, which results in performance degradation. To address these problems, we integrated PM as a memory-based model and as a storage-based model simultaneously on one commercial server, which offers a short-cut approach for enterprises to build commercial NUMA machines with large-capacity PM. In the memory-based model, rather than focusing on the persistence attribute, we propose an architecture that benefits managing the integrated PM and DRAM space in a unified manner and that facilitates bypassing vast modifications to the system software. We also present an adaptive mechanism that can automatically introduce a moderate amount of PM into the local socket to hinder access of a remote socket by the degree of memory pressure. In the storage-based model, under the condition of taking full advantage of the PM's persistence, we abstract a PM volume device and overcome the torn sector problem. To demonstrate the effectiveness of the proposed scheme, we design and implement Dapper, an adaptive persistent memory manager prototype. The experimental results show that, compared to typical memory management approaches, Dapper achieves performance improvements of 13.1 percent to 34.0 percent on average on Graph500 BFS_SSSP benchmarks and SPEC CPU2006 floating point workloads, respectively. Moreover, when deploying F2FS on our PM volume, we find that Dapper outperforms existing methods by 5.8 percent on tar and by 11.9 percent on untar.
               
Click one of the above tabs to view related content.