Articles with "failure tolerant" as a keyword



Photo by victorfreitas from unsplash

Failure Tolerant Training With Persistent Memory Disaggregation Over CXL

Sign Up to like & get
recommendations!
Published in 2023 at "IEEE Micro"

DOI: 10.1109/mm.2023.3237548

Abstract: This article proposes TrainingCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, we integrate persistent memory (PMEM) and graphics… read more here.

Keywords: tolerant training; training; cxl; persistent memory ... See more keywords