LAUSR: ebpf

ALEPH: Accelerating Distributed Training With eBPF-Based Hierarchical Gradient Aggregation

Sign Up to like & get
recommendations!
0 Published in 2024 at "IEEE/ACM Transactions on Networking"

DOI: 10.1109/tnet.2024.3404999

Abstract: Distributed training includes two important operations: gradient transmission and gradient aggregation, which will consume massive bandwidth and computing resources. To achieve efficient distributed training, one must overcome two critical challenges: heterogeneity of bandwidth resources and… read more here.

Keywords: gradient aggregation; ebpf; distributed training;

LAUSR

You are not signed in:

Sign Up!

ALEPH: Accelerating Distributed Training With eBPF-Based Hierarchical Gradient Aggregation