Sign Up to like & get
recommendations!
0
Published in 2024 at "IEEE/ACM Transactions on Networking"
DOI: 10.1109/tnet.2024.3404999
Abstract: Distributed training includes two important operations: gradient transmission and gradient aggregation, which will consume massive bandwidth and computing resources. To achieve efficient distributed training, one must overcome two critical challenges: heterogeneity of bandwidth resources and…
read more here.
Keywords:
gradient aggregation;
ebpf;
distributed training;