IO and Interrupt Virtualization

Masum Z. Hasan All Rights Reserved

x86 Architecture Basics.

We focus on network IO and IRQ.

Full Software-based Virtualization

As I mentioned here shared resources are virtualized via full software emulation or trap-and-emulate or hardware-assisted virtualization. Assuming no hardware-assisted virtualization support following is a conceptual description of how IO and interrupt can be virtualized. This is explained with the following figure. I’ll cover VM to VM and outside of host to VM communications.

Components shown above the broken line can reside in userspace (such as in QEMU or Vhost-user, (DPDK) virtual switch in userspace).

NIC Virtualization

IO, which in this case is a NIC, can be virtualized via emulation in software, such as the Intel E1000 NIC emulation in QEMU (QEMU code for E1000 emulation based on the E1000 spec). You will see in the code everything or most of the capabilities specified in the spec is emulated, including interrupt generation, that is, virtual interrupts are emitted via the code.

IO also can be virtualized via para-virtualization, such as the virtio (spec).

LAPIC Virtualization

A VM may be assigned one or more virtual CPU (VCPU). As we have seen here, each core has a LAPIC associated with it. Hence in a fully virtualized environment a VCPU has to have a virtualized LAPIC associated with it. The LAPIC registers and the operations (see here for operation performed by LAPIC hardware after EOI) have to be emulated that otherwise is implemented in LAPIC hardware. An example of LAPIC virtualization via software emulation can be found here.

static int apic_set_eoi(struct kvm_lapic *apic)

{

int vector = apic_find_highest_isr(apic);

trace_kvm_eoi(apic, vector);

/*

* every write EOI will has corresponding ISR,

* one example is when Kernel check timer on setup_IO_APIC

*/

if (vector == -1)

return vector;

apic_clear_isr(vector, apic);

apic_update_ppr(apic);

if (test_bit(vector, vcpu_to_synic(apic->vcpu)->vec_bitmap))

kvm_hv_synic_send_eoi(apic->vcpu, vector);

kvm_ioapic_send_eoi(apic, vector);

kvm_make_request(KVM_REQ_EVENT, apic->vcpu);

return vector;

}

Interrupt Handling Virtualization

With full software-based virtualization

As shown in the figure above, the VM to VM communication is all in software. When the E1000 emulator receives a packet, it generates a (virtual) IRQ and reports to the respective emulated LAPIC, which then reports it to the respective VCPU (the sequence is almost exactly as it is in done in the hardware in non-virtualized environment). In the non-virtualized case, once the IRQ is received by the CPU, it transitions into the CPL=0 mode, which is done automatically (in hardware). Then via vectoring into the IDT the registered interrupt handler is invoked in CPL=0 mode. Hence, in a virtualized environment, first, the transition to CPL=0 has to be forced, which can be done via INT <vector #>, second, the IRQ has to be vectored into the guest OS IDT so that the interrupt handler can be invoked in CPL=0 mode. One possible way to do this is to create a shadow IDT in the hypervisor, which the latter can create during guest initialization by monitoring interrupt handler and IRQ registrations. Multiple guest OS (same type or not, such as Linux, Windows) may have overlapping IRQ, which the hypervisor can handle using multiplexing method (as it is done in sys_syscall we discussed before). Once interrupt is handled, the guest OS has to send an EOI, which will cause an exit to the hypervisor to emulate in LAPIC emulator. In addition, multiple exits will happen as the driver and guest OS are performing Rx/Tx of packets.

Interrupts coming from hardware NICs are handled by the host IDT and handlers as in the non-virtualized case. If the packets are going to a VM, then the handling of interrupts after it crosses the virtual switch is as described above. Note that a hardware IRQ will interrupt any task (VM, host OS, hypervisor, other non-VM userspace) executing on the CPU or core receiving the IRQ causing no-voluntary exit of the executing task.

With hardware-assisted virtualization

A VM should be occupying a CPU as long as possible. Hence frequent VM exits should be avoided due to IO, interrupt and EOI handling. With hardware-assisted virtualization support (that Linux KVM drives via /dev/kvm), major registers are emulated in VMCS (as discussed here). One of the registers in the VMCS is the IDT register. In addition, there are VM execution control fields in VMCS for controlling interrupt handling. Some of the CPU hardware capabilities, such as LAPIC are also emulated in hardware (yes, hardware capabilities emulated in hardware!). The LAPIC registers and relevant operations are emulated in hardware (virtual LAPIC or LAPIC virtualization). Hence a virtual interrupt can be delivered to the virtual LAPIC of a VM, which then can be delivered to the CPU executing the VM in non-root VMX mode (and CPL=0). Since the IDT for the guest is in VMCS, the virtual IRQ can vector into it directly and the interrupt handler invoked directly. Once the EOI is written into it is virtualized again. All these without any VM exit. Following figure describes this.

Here is excerpt from Intel manual on virtual interrupt delivery:

If a virtual interrupt has been recognized (see Section 29.2.1), it is delivered at an instruction boundary when the following conditions all hold: (1) RFLAGS.IF = 1; (2) there is no blocking by STI; (3) there is no blocking by MOV SS or by POP SS; and (4) the “interrupt-window exiting” VM-execution control is 0.

Virtual-interrupt delivery has the same priority as that of VM exits due to the 1-setting of the “interrupt-window exiting” VM-execution control.1 Thus, non-maskable interrupts (NMIs) and higher priority events take priority over delivery of a virtual interrupt; delivery of a virtual interrupt takes priority over external interrupts and lower priority events.

Virtual-interrupt delivery wakes a logical processor from the same inactive activity states as would an external interrupt. Specifically, it wakes a logical processor from the states entered using the HLT and MWAIT instructions. It does not wake a logical processor in the shutdown state or in the wait-for-SIPI state.

Virtual-interrupt delivery updates the guest interrupt status (both RVI and SVI; see Section 24.4.2) and delivers an event within VMX non-root operation without a VM exit. The following pseudocode details the behavior of virtual- interrupt delivery (see Section 29.1.1 for definition of VISR, VIRR, and VPPR):

Vector ← RVI;

VISR[Vector] ← 1;

SVI ← Vector;

VPPR ← Vector & F0H;

VIRR[Vector] ← 0;

IF any bits set in VIRR

THEN RVI ← highest index of bit set in VIRR

ELSE RVI ← 0; FI;

deliver interrupt with Vector through IDT;

cease recognition of any pending virtual interrupt;

Example traces from real execution in virtualized environment.

Inside VM:

Masum Z. Hasan All Rights Reserved