Large-scale matrix multiplication is a critical operation in various fields such as machine learning, scientific computing, and graphics processing, but performing it on a single machine introduces significant computational latency.… Click to show full abstract
Large-scale matrix multiplication is a critical operation in various fields such as machine learning, scientific computing, and graphics processing, but performing it on a single machine introduces significant computational latency. Therefore, matrices are partitioned along different dimensions, decomposed into multiple subtasks, and executed in distributed systems. However, the presence of stragglers in distributed systems can severely impact the speed of matrix multiplication. So, coding schemes are introduced to mitigate the straggler problem. Recently, coding schemes for three-dimensional matrix partitioning have gained increasing attention, including DEP codes. However, these schemes have not focused on decoding accuracy and job completion time. In this paper, with the aim of enhancing decoding accuracy and reducing job completion time, we combine the grouping strategy with Systematic Matdot codes to propose Grouped Systematic Matdot (GSM) codes. Experimental results demonstrate that, compared to DEP codes, GSM codes ensure 100% decoding accuracy and achieve shorter encoding time, communication time, and local computation time, thereby reducing job completion time by at least 45%. Moreover, GSM codes consume fewer memory resources, and as the matrix size increases, their time advantage becomes more pronounced.
               
Click one of the above tabs to view related content.