Providing fair-share scheduling on symmetric multicore computing systems via progress balancing

Linux, which has emerged as the most favored operating system, falls short of expectations when it comes to fair share multicore scheduling. This is because the primary task scheduler of the mainline Linux kernel, called the completely fair scheduler (CFS), cannot provide a desired level of fairness in a multicore system. The CFS consists of two components: a per-core fair-share scheduler and a load balancer. Each component has a shortcoming in achieving multicore fairness.

First, the current realization of a task’s virtual runtime in CFS cannot be utilized for achieving fairness in a multicore environment. The goal of CFS is to approximate GPS [1] by giving each task a CPU time in proportion to its assigned weight. At the per-core level, CFS successfully achieves this goal by using the virtual runtime. At every scheduling decision point, a per-core scheduler dispatches the task with the smallest virtual runtime on the core. However, in a multicore system, the virtual runtime cannot be a global measure for a task’s progress since the per-core scheduler of CFS maintains only relative order among the runnable tasks in each run-queue by locally manipulating virtual runtimes when a task is enqueued or dequeued.

Second, CFS relies on load balancing to achieve multicore fairness. As pointed out earlier, balancing loads among cores is irrelevant to bounding differences in the virtual runtimes of tasks. Note that virtual runtime distribution varies on cores regardless of loads and that the per-core schedulers of CFS run independently of each other. To exacerbate the problem, CFS cannot provide perfect load balancing for two reasons: weight quantization error and reluctant load balancing, as discussed in [2]. The quantization error occurs because a task’s weight is specified with one of the predefined positive integers in Linux implementations. In addition, CFS allows for persistent load imbalance among cores since it is very reluctant to perform load balancing, unless the load imbalance exceeds a given threshold.

Virtual runtime difference divergence of CFS
Virtual Runtime Difference Divergence of CFS

To address this problem, we introduce a new algorithm for task migration, which we name Progress Balancing. The proposed algorithm periodically partitions runnable tasks into the same number of groups as the number of cores in the system and shuffles the tasks in such a way that a core with larger virtual runtimes receives a larger load and the load differences among cores are bounded. The proposed algorithm achieves this property by incorporating a work-non-conserving mechanism called throttling. This property ensures that tasks with larger virtual runtimes run at a slower pace until the next period. In addition, we elaborately redesign CFS so that it keeps track of the cumulative virtual runtimes of the tasks. Our formal analysis shows that our approach achieves a constant bound on the virtual runtime differences, regardless of the number of tasks or cores.

 Overview of the Progress Balancing
Overview of the Progress Balancing

[1] A.K. Parekh and R.G. Gallagher, “A generalized processor sharing approach to flow control in integrated services networks: the multiple node case,” IEEE/ACM Transactions on Networking (TON), 1994.

[2] S. Huh, J. Yoo, M. Kim and S. Hong, “Providing fair share scheduling on multicore cloud servers via virtual runtime-based task migration algorithm,” IEEE 32nd International Conference on Distributed Computing Systems (ICDCS), 2012.