Articles with "ebpf" as a keyword



ALEPH: Accelerating Distributed Training With eBPF-Based Hierarchical Gradient Aggregation

Sign Up to like & get
recommendations!
Published in 2024 at "IEEE/ACM Transactions on Networking"

DOI: 10.1109/tnet.2024.3404999

Abstract: Distributed training includes two important operations: gradient transmission and gradient aggregation, which will consume massive bandwidth and computing resources. To achieve efficient distributed training, one must overcome two critical challenges: heterogeneity of bandwidth resources and… read more here.

Keywords: gradient aggregation; ebpf; distributed training;