Multi-threading is a common way for programs to benefit from the multi/many-core design. However, the performance of some parallel programs does not increase/even decrease as the number of cores/threads increases.… Click to show full abstract
Multi-threading is a common way for programs to benefit from the multi/many-core design. However, the performance of some parallel programs does not increase/even decrease as the number of cores/threads increases. Our study shows that the performance of a parallel program is impacted by the number of cores/threads, the thread placement, the inputs of the program. It is nontrivial to identify the optimal number of cores and the corresponding thread placement to maximize the performance, when the input of a program is determined online and the workload of different iterations may not be identical. To resolve the above problem, we propose Otter, a thread auto-tuning system at runtime for iterative parallel programs. Otter collects the runtime information in the first few iterations and makes decisions on the number of threads and thread placement policy to achieve the goal of improving performance or saving resources. It considers the characteristics of dynamic workload in the iteration process and reduces the time overhead through a migration method. Experiments on a 96-core machine show that Otter improves the performance of the benchmarks by 20.7% and reduces core hours by 51.3% on average compared to the case of running them with all the CPU cores.
               
Click one of the above tabs to view related content.