Load Balance in Linux 2.6.32 Load balancing Sung-joon Choi Real-Time Operating Systems Lab. Seoul National University 2011-09-15 Load Balance in Linux 2.6.32 Contents Load balancing Purpose Definition General cases • Active load balancing • Passive load balancing Special cases • Execution of a new task • CPU’s shut down or intentionally being IDLE Limitation 2 Load Balance in Linux 2.6.32 Load Balancing Purpose 시스템에 코어 수보다 많은 수의 작업(task)이 있는 한, 모든 코어가 IDLE 상태 없이 수행하도록 조절 Mechanism 코어 간에 작업량 차이가 크지 않도록 조절 Definition Load balancing • SMP 구조에서 각 코어가 균등한 작업량(load)을 가지도록 조절 하는 것 Load • 코어의 run-queue 가 갖는 모든 task 들의 weight 를 더한 값 3 Load Balance in Linux 2.6.32 Load Balancing Definition (cont.) Idlest run-queue • A run-queue that has the minimum load among the cores Busiest run-queue • A run-queue that has the maximum value which is scale factor “load / (core’s power)” • 모든 코어의 power 가 동일하다면 maximum load 를 갖는 코어 의 run-queue 를 의미한다 • 이종의 프로세서를 사용하는 시스템이라면 각 코어의 power 가 다를수도 있다. • 일반적으로 power는 capacity 또는 작업수행능력을 의미한다. 4 Load Balance in Linux 2.6.32 Contents Load balancing Purpose Definition General cases (mainly focused part) • Active load balancing • Passive load balancing Special cases • Execution of a new task • CPU’s shut down or intentionally being IDLE Limitation 5 Load Balance in Linux 2.6.32 General Cases Active Load Balancing Core 0 Core 1 Core 0 Core 1 Core 0 Run-queue is empty Task 1 Task 2 Run-queue Task 4 Run-queue Core 1 Core 1 is going to IDLE Core 0 Task migration Task 1 Task 1 Task 1 Task 2 Task 2 Task 2 Run-queue Run-queue Run-queue Run-queue Run-queue Task 3 Task 5 Task 3 Task 4 Task 3 Task 4 Task 3 Current task Current task Current task Current task Current task Current task Current task READY RUNNING Core 1 Task 2 Run-queue Current task Going to DEAD (Assumption: all tasks have same weight) 6 General Cases Active Load Balancing Implementation When a task is going to end up its execution time do_exit() • Sets task’s state to “TASK_DEAD” • schedule() – In back-end procedure, if a core’s state is IDLE, it calls “idle_balance()” – idle_balance() » To pull a task on the busiest core’s run-queue, it calls “load_balance()” » load_balance() » Does a task migration 7 General Cases Active Load Balancing Drawback Active load balancing 으로도 충분히 load balancing 을 달성 할 수 있지만 코어 간 작업량 차이가 큰 상황인데도 각 태스크의 수행시간 이 길어서 IDLE 상태를 갖게 되는 코어가 한동안 없다면, 단 기간 내 load balancing 의 목적을 달성할 수 없다. 이 상황을 피하기 위해서 주기적인 조절이 필요하다 8 Load Balance in Linux 2.6.32 General Cases Passive(Periodic) load balancing Core 0 Core 1 Core 0 Core 1 Core 0 Core 1 Core 0 Core 1 Task 1 Task 4 Task 4 Task 4 Task 2 Task 1 Task 1 Task 1 Task 6 Task 2 Task 2 Task 3 Run-queue Task 5 Run-queue Task 2 Run-queue Task 6 Run-queue Task 2 Run-queue Task 6 Run-queue Run-queue Run-queue Task 4 Task 6 Task 3 Task 5 Task 3 Task 5 Task 3 Task 5 Current task Current task Current task Current task Current task Current task Current task Current task For a long time, there is no IDLE core READY If there is big gap of load between cores, it is uncomfortable RUNNING Going to DEAD (Assumption: all tasks have same weight) Periodic check Busiest run-queue Task migration Idlest run-queue 9 General Cases Passive Load Balancing Triggered by scheduler_tick() Tick value is compared with a parameter “next_balance” which is the time to do load balancing • Each run-queue has “next_balance” • If a core takes the active load balancing, the parameter is set to 1 second after • If a core takes the passive load balancing, the parameter is set to 1 minute after • 1초와 1분의 차이는 IDLE 상태를 밸런싱했던 코어는 다시 IDLE 상태가 되기 쉽기 때문에 곧바로 밸런싱을 해주기 위한 것 Executed by bottom-half handler A softirq named “SCHED_SOFTIRQ” is handled by “run_rebalance_domains()” 10 General Cases Passive Load Balancing Implementation – start load balance Core 0 Core 1 Task 1 Task 2 Task 3 Task 4 Timer interrupt invokes “scheduler_tick()” Task 6 If the tick value is equal to or greater than parameter “next_balance”, Run-queue Run-queue Task 5 Task 7 Current task Current task Next_balance Next_balance READY RUNNING Busiest run-queue (Assumption: all tasks have same weight) Idlest run-queue 11 General Cases Passive Load Balancing Implementation – step1 Core 0 Core 1 SCHED_SOFTIRQ Task 1 ??? … Task 2 Softirq table Task 3 Task 4 Run-queue Task 6 Run-queue Task 5 Task 7 Current task Current task Next_balance Next_balance If the tick value is equal to or greater than parameter “next_balance”, Step1: raises a softirq “SCHED_SOFTIRQ” to kernel READY RUNNING Busiest run-queue (Assumption: all tasks have same weight) Idlest run-queue 12 General Cases Passive Load Balancing Implementation – step2 Core 0 Core 1 SCHED_SOFTIRQ Task 1 ??? … Task 2 Softirq table Task 3 Task 6 Task 4 ksoftirqd Run-queue Run-queue Task 5 Task 7 Current task Current task Next_balance Next_balance If the tick value is equal to or greater than parameter “next_balance”, Step1: raises a softirq “SCHED_SOFTIRQ” to kernel Step2: finds the idlest run-queue to invoke a kernel thread “ksoftirqd” READY RUNNING Busiest run-queue (Assumption: all tasks have same weight) Idlest run-queue 13 General Cases Passive Load Balancing Implementation – step3 Core 0 Core 1 Handler function (bottom-half handler) SCHED_SOFTIRQ run_rebalance_domains() Task 1 ??? … Task 2 Softirq table Task 3 Task 6 Task 4 Task 7 Run-queue Run-queue Task 5 ksoftirqd Current task Current task Next_balance Next_balance If the tick value is equal to or greater than parameter “next_balance”, Step1: raises a softirq “SCHED_SOFTIRQ” to kernel Step2: finds the idlest run-queue to invoke a kernel thread “ksoftirqd” Step3: the thread executes a function “do_ksoftirqd()” that picks a softirq and calls its handler function READY RUNNING Busiest run-queue (Assumption: all tasks have same weight) Idlest run-queue 14 General Cases Passive Load Balancing Implementation – step4 Core 0 Core 1 Handler function (bottom-half handler) SCHED_SOFTIRQ run_rebalance_domains() Task 1 ??? … Task 2 Softirq table Task 3 Task 6 Task 4 Task 7 Run-queue Run-queue Task 5 ksoftirqd Current task Current task Next_balance Next_balance If the tick value is equal to or greater than parameter “next_balance”, Step1: raises a softirq “SCHED_SOFTIRQ” to kernel Step2: finds the idlest run-queue to invoke a kernel thread “ksoftirqd” Step3: the thread executes a function “do_ksoftirqd()” that picks a softirq and calls its handler function Step4: the handler function finds the busiest run-queue to pull a task READY RUNNING Busiest run-queue (Assumption: all tasks have same weight) Idlest run-queue 15 General Cases Passive Load Balancing Implementation – step5 Core 0 Core 1 Handler function (bottom-half handler) SCHED_SOFTIRQ run_rebalance_domains() Task 1 ??? Task 2 Task 4 … Softirq table Task 3 Task 6 Task 4 Task 7 Run-queue Run-queue Task 5 ksoftirqd Current task Current task Next_balance Next_balance If the tick value is equal to or greater than parameter “next_balance”, Step1: raises a softirq “SCHED_SOFTIRQ” to kernel Step2: finds the idlest run-queue to invoke a kernel thread “ksoftirqd” Step3: the thread executes a function “do_ksoftirqd()” that picks a softirq and calls its handler function Step4: the handler function finds the busiest run-queue to pull a task Step5: task migration READY RUNNING Busiest run-queue (Assumption: all tasks have same weight) Idlest run-queue 16 General Cases Passive Load Balancing Implementation Core 0 Core 1 Core 0 Core 1 Task 1 Task 2 Task 1 Task 3 Task 2 Task 4 Task 3 Task 6 Task 4 Run-queue Task 6 Run-queue Run-queue Run-queue Task 5 Task 7 Task 5 Task 7 Current task Current task Current task Current task Next_balance Next_balance Next_balance Next_balance 17 General Cases Passive Load Balancing Drawback This algorithm has large overhead • The algorithm should check the maximum and minimum load out of all cores • And, if a current core is not the idlest one, – The kernel thread “ksoftirqd” should be enqueued to the idlest runqueue of other core and waken up – Also, a current task of the target core that has the idlest run-queue is preempted by “ksoftirqd” Tradeoff: balancing time interval throughput latency 18 Load Balance in Linux 2.6.32 Contents Load balancing Purpose Definition General cases • Active load balancing • Passive load balancing Special cases • Execution of a new task • CPU’s shut down or intentionally being IDLE Limitation 19 Load Balance in Linux 2.6.32 Special Cases Execution of a new task When a new task is created in one core, kernel checks the core’s load whether it is reasonable to handle a new task • If the load is unacceptable, current task of the core is migrated to the idlest core’s run-queue and rescheduled • And a new task is executed in the core (not the idlest core) CPU’s shut down or intentionally being IDLE When one core should be shut down or intentionally be IDLE, such as in POWER_SAVING_LOAD_BALANCE All tasks in its run-queue are migrated to other cores Actually, this case is just a task migration 20 Load Balance in Linux 2.6.32 Contents Load balancing Purpose Definition General cases • Active load balancing • Passive load balancing Special cases • Execution of a new task • CPU’s shut down or intentionally being IDLE Limitation 21 Load Balance in Linux 2.6.32 Limitation Global Fairness Global Fairness는 여러 개의 CPU로 이루어진 SMP에서 모 든 task가 자신의 weight에 비례해서 run-time을 보장받는 정 도를 의미한다. SMP 환경에서 Run queue가 CPU에 하나씩 있고, Load Balance는 각 Run queue의 load(sum of weight)만을 고려해 서 task를 옮기므로 task가 자신의 weight에 비례한 시간을 못 받는 경우가 생긴다. • Example) Dual-core CPU에 서로 같은 weight를 갖는 task1, 2, 3가 있을 때 CPU1의 Run-queue에는 task1이 있고, CPU2의 Run-queue에는 task2, task3이 들어간다. 이 경우 load balance 가 잘 일어나지 않으므로 서로 같은 weight를 갖고 있음에도 같 은 run-time을 보장 받지 못한다. 22 CFS in Linux 2.6.37 End Q & A? 23