X86 Architecture Basics: Privilege Levels and Registers

Masum Z. Hasan All Rights Reserved

Privilege Levels

Only a hypervisor can control the hardware resources. Hence a hypervisor has to execute at the highest privilege level, while any user space program has to execute at any level below that. In a x86 computer there are 4 privilege levels, though only two levels are typically used, level or ring 0 for OS/hypervisor and level 3 for user space programs. When a program runs on the CPU, two bits in a register called the code selector (CS) register indicate the current privilege level or CPL of that program. We will use the terms privileged mode, Ring 0 and CPL=0 interchangeably.

Privileged and Non-Privileged Instructions and Registers

The characteristics of an instruction set architecture (ISA), such the x86 can affect whether the computer supporting that ISA is directly virtualizable or not. Let us look into some of the characteristics of the x86 ISA (instruction set architecture) that potentially affects virtualization of CPU.

Certain registers are privileged, that is, they can be accessed or modified only in privileged mode (Ring 0 or CPL=0) or these registers can be accessed or modified under specific privilege level. A few examples are as follows:

  • FLAGS register (EFLAGS: 32-bit, RFLAGS: 64-bit): This register has many flags. We will refer to following in our examples:

    • Interrupt enable or IF flag (one bit) that controls enabling or disabling of maskable interrupts (see below). If the IF is set to 1, an maskable interrupt can interrupt the processor, if 0, it cannot.

    • IO privilege level (IOPL) flag (2 bits). A task or program (T1) can access IO ports if its current privilege level (CPL) is less than or equal to IOPL. A program or task may get access privilege to a port via the IO bitmap. The POPF (POPF: 16 bit, POPFD: 32 bit or POPFQ: 64-bit architecture) instruction pops the top of the stack into the privileged FLAGS register, as a result of which the IOPL and the IF flags can be modified. The IOPL can only be modified by the POPF and IRET instructions when CPL is 0. A program can modify the IF flag, only if its CPL is less than or equal to the IOPL.

  • Control registers are privileged registers. A few examples are as follows:

    • The CR0 register can be used to enable or disable certain processor features, such as protected mode to turn on virtual addressing and memory paging.

    • The CR2 register is loaded with the address of memory location from where execution will resume when a page fault occurs.

    • The CR3 register is loaded with an address pointing to the root of memory page directory, which is used to walk page tables to locate a memory page.

    • Debug registers DR0-7.

  • IO registers: These are registers or memory on IO devices or ports, such as the PCIe [REF] BAR (base address register). These registers are privileged.

The CPU Instructions are categorized as non-privileged, privileged, sensitive and privileged without exceptions as follows:

  • Non-privileged Instructions: CPU instructions that do not have to be run in privileged mode. For example, a MOV (move one operand to another) instruction that does not operate on a privileged register, such as the ones described above.

  • Privileged instructions [Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1, Section 5.9] are instructions that can only be executed at CPL of 0. Any attempt to execute these instructions at less privilege level will result in general protection (#GP) exception or fault (see below for definition). Following are a few examples of privileged instructions:

    • HLT: Halt CPU till next interrupt.

    • INVLPG: Invalidate a page entry in the translation look-aside buffer (TLB).

    • LIDT: Load Interrupt Descriptor Table.

    • MOV CR registers: load or store control registers. In this case the MOV instruction (a non-privileged instruction on its own) is accessing a privileged register.

    • RDMSR, WRMSR: Read / write model specific registers (MSR).

  • Sensitive Instructions: These are instructions that are IOPL sensitive. These instructions can only be executed when CPL <= IOPL. Otherwise a GP exception will result.

  • CLI: Clear interrupt by setting the IF bit of the FLAGS register to 0 (clear the IF bit). When cleared maskable external interrupts (see below for definition) are ignored (not handled). The clearing has no effect on exceptions and non-maskable interrupts or NMI (see below), which are handled even if this bit cleared.

  • STI: Set interrupts by setting the IF bit of the FLAGS register to 1. In this case maskable external interrupts are handled after the next instruction. It has no effect on exceptions and NMI.

  • Privileged IO instructions to access IO ports (Port-mapped IO): These instructions are used to access IO ports through a separate IO address space. These are privileged instructions

    • IN / OUT: move data between I/O ports and a CPU register (RAX/EAX register). The address of the I/O port can be an immediate operand or contained in the EDX/RDX register. The other instructions are INS and OUTS to move string data between an IO port and memory.

  • Privilege instructions without exception:

    • POPF and IRET instructions to modify IOPL: These instructions can change the IOPL flag only if they are executed at CPL=0. An attempt to execute these instructions at CPL > 0 to change the IOPL is simply ignored and no exception is generated.

    • POPF to modify IF: A program can modify the IF flag by executing the POPF instruction if the program’s CPL <= IOPL. Any attempt by a less privileged program is simply ignored and no exception is generated.

  • Privileged instructions for IO register access: We have discussed port-mapped IO above. The IO devices or registers can also be accessed via a method called the memory-mapped IO (MMIO), where IO device registers are mapped to specific area of the main memory so that regular memory read or write instructions (such as MOV) can be used to perform IO. For PCI devices the base address register or BAR is programmed with the start of the memory address (MA1) the device is mapped to. A register then “resides” at an offset from the MA1. For example, the RBSTART or RxBuf register of the RTL8139 NIC [RTL8139_Spec] resides at an offset of 0x30. Hence the execution of, MOV %eax <MA1 of RTL8139>+0x30 will result in writing of the content of the eax register to the RxBuf register. A conceptual depiction of MMIO is shown below. Since the IO register or memory is privileged, any instruction referring to it becomes privileged. The page protection mechanism described in another article can be used to protect MMIO accesses.

Masum Z. Hasan All Rights Reserved