Scheduler Design

Design of RT-XEN 2.x

[We focus on explaining the design of the global EDF/RM schedulers. As to the design of partitioned schedulers, the difference is that each CPU has a RunQ and each CPU only pick VCPUs from its RunQ. The other logics are the same.]

RT-Xen 2.x uses a global RunQ to hold all VCPUs scheduled by the real-time global EDF/RM schedulers.  This RunQ has two parts: the first part has VCPUs with budget sorted by priority; the second part has VCPUs without budget. This RunQ is protected by a spin-lock. Whenever a cpu tries to insert/delete a VCPU from this RunQ, it has to first grab this global RunQ lock. 

How does the scheduler works in general?

Each CPU will call rt_schedule() function every 1ms. This rt_schedule() function does the following things in sequence:

    1) Burn the budget of the current running task on this CPU;

    2) Replenish the budget of all VCPUs in the RunQ; If a VCPU does not need to replenish its budget yet (because its next period has not start), we just skip this VCPU in the budget replenish function.

    3) Pick the highest priority VCPU VP' from RunQ that is feasible on this CPU; If the picked VP'  has higher priority than the current running VCPU VP on this CPU, the scheduler pick the VP' to run on this CPU in the next 1ms and does the context switch between VP' and VP; otherwise,  the scheduler insert VP' back and let the current VP continue running on this CPU in the next 1ms. 

A VCPU has two categories of CPU masks:

    a) CPU mask: the CPU mask this VCPU should run on, which is assigned by using "xl vcpu-pin" command; and 

    b) CPU Pool mask: the CPU mask of the CPU Pool in which the domain of this VCPU is assigned, which is assigned by using "xl cpupool-migrate" command. (If you are not familiar with CPU Pool in Xen, you can just assume VCPUs do not have this CPU Pool mask because this mask marks all CPUs available to each VCPU by default.)

A feasible CPU for a VCPU has its index in the set of {CPU mask & CPU Pool mask}. 

An available CPU for a VCPU is a feasible CPU which is idle or has a lower-priority task running on it now.

What happens when an interrupt is raised?

When an interrupt is raised on a VCPU VP_i, the scheduler will do the following operations in sequence:

    a) Insert this VCPU VP_i into the RunQ and sort the RunQ;

    b) Pick the first VCPU VP_j (highest priority VCPU) in the RunQ; (VP_j may or may not be VP_i.)

    c) Find an feasible CPU to run this highest priority VCPU VP_j. If several CPUs are available, the preference is as follows: Idle CPU where this VP_j was running > Idle CPU where this VP_j was not running > CPU where a lower priority VCPU is running.

Design of RT-XEN 1.x

We assume that every guest OS is equipped with one virtual CPU (VCPU), and are all pinned on one specific physical CPU (PCPU). In DS, PES, and POS, each VCPU has three parameters: budget, period, and priority. SS requires each VCPU to record two more parameters: status, and the total amount of budget consumed since the last execution.

Every PCPU is equipped with three queues: a Run Queue (RunQ), a Ready Queue (RdyQ), and a Replenishment Queue (RepQ). The RunQ and RdyQ are used to store active VCPUs (the guest OS is not paused). The RunQ holds VCPUs that have tasks to run (regardless of budget), sorted according to priority. Every time do_schedule is triggered, it inserts the currently running VCPU back into the RunQ or RdyQ, then picks the highest priority VCPU with a positive budget from the RunQ, and runs it for one quantum (we choose the quantum to be 1 ms). The RdyQ holds VCPUs that have no task to run, but may still have budgets. It is used for PES, e.g, when a VCPU A goes to the RdyQ and still has budget, if the current running VCPU has lower priority than A, A's budget needs to be consumed as well.

The RepQ stores replenishment information for all the VCPUs on the specific PCPU. Every element in RepQ contains three parameters: the VCPU to replenish, the replenishment time, and the replenishment amount. At each scheduling quantum, the RepQ is checked and any current replenishment is performed. If the VCPU replenished has higher priority than the current running one, an interrupt is also raised to trigger the do_schedule function which performs the context switch. For DS, PES, and POS, the RepQ is independent of the VCPU's actual execution: every VCPU receives a replenishment with a fixed period. For SS, insertion into the RepQ is dynamically decided by the status of the VCPU.