The first process

Recall: in Lecture 3, after switched to the protected mode and enabled paging, the kernel code can finally enter the world of C again (the last time was in the boot loader).

  • The last instruction in the assembly is "jmp main", which jumps to the function main() (in main.c)
  • What's the difference between jmp and call?
  • JMP -- jmp only changes the %rip.
  • CALL/RET -- "call" pushes the return address on the stack; "ret" set the %rip with the corresponding value on the stack.
  • The kernel's main() function never returns. Jump is sufficient.
  • See the declaration of mpmain() at main.c:10. "noreturn" functions? gcc does not generate the "ret" instruction for these functions and won't warn for not returning.

Initialization on the first CPU:

  • free list of physical memory pages; a new kernel page table (that setup in the assembly is replaced, again)
  • Detect other CPUs, if any.
  • interrupt controller (a timer interrupt is configured; the controller is then enabled, but the interrupt on the cpu has not yet been enabled, why?)
  • -- the CPU's interrupt handler has not been properly configured. the interrupt controller generates the interrupts, and the CPU handles the interrupts.
  • trap vectors, for what? -- interrupt handlers

Jump-start other CPUs

  • code = P2V(0x7000) -- the main CPU now prepares some code for the other CPUs. The other CPUs will start at a low address (0x7000) abut the main CPU uses the kernel memory address (high address) for the copying, so P2V() for making the copy.
  • Where is the kernel image loaded? -- phy. mem. 0x100000 (1MB)
  • The other processors will start in the 16-bit mode so the code above 1MB cannot be immediately accessed by them
  • the first CPU copies a small piece of code into some unused low address and calls lapicstartap() -- it activates a CPU (AP).
  • the symbol "_binary_entryother_start" is generated by the linked, so you won't find the exact symbol in the source code files
  • symbol format: _binary_<filename>_<symbol>
  • The start-up code for the AP is at beginning of entryother.S -- read the file header for more information
  • the AP finally calls entry32mp, prepared below the "unused" memory below "code"
  • read some interesting comments in lapicstartap()
  • AP sets started to 1 in mpmain(), which allows the main CPU to continue to activate other CPUs.
  • mpmain() finally calls scheduler(), which will loop forever to schedule user processes to run on that CPU.
  • Is there a process for the AP(s) to run? -- at the beginning only the init process exists, so it depends on which CPU's scheduler thread takes the process first.

Preparing the first process

Memory allocation is available:

  • kinit1() and kinit2() add about 224MB physical memory to the free list
  • After that kernel can call kalloc() and kfree() to allocate memory in the unit of pages.

The main CPU will:

  • Creates a user process with userinit()
  • calls mpmain(), which eventually calls scheduler().

userinit() at proc.c

  • struct proc * for the init process
  • allocproc() will setup the kernel-side stack for switching (actually "return") to user space with sysret.
  • setupkvm() -- create user page table based on the kernel page table. Why? -- when trapped into the kernel, the kernel relies on the "kernel part" of the page table to run. So in a syscall the kernel does not need to switch to a "kernel-only" page table.
  • inituvm() -- the user portion of the memory, copy the initcode (source code in initcode.S) to the user memory at 0x1000 (the second 4KB).
  • The first 4KB (0x0 to 0xFFF) is used as the stack (tf->rsp = p->sz).
  • User-space execution will start at 0x1000 (tf->rcx = PGSIZE). sysret will restore the user rip with this value.
  • Kernel will "return" to the first user context using this "handcrafted" trapframe.
  • Set the process state to RUNNABLE (ready to be activated by the scheduler) -- what are the other states? RUNNING/SLEEPING/ZOMBIE
  • While it is the kernel who set an user process to be RUNNING, it's not immediately running. The kernel will then surrender the CPU to the user process.

The first process' kernel stack layout (above KERNBASE)

low address                                          high address
----------------------------------------------------------------+
......| struct context     |  X  | struct trapframe             |stacktop
-----------------------^^^----^-----------------------^^^-------+
              context->rip == forkret
                              X == syscall_trapret
                                                  tf->rcx == 0x1000 (initcode.S:start at user space)

After switching to this kernel stack:

  • call forkret() -- it releases the scheduler lock.
  • when returning from forkret(), the ret instruction uses the "X" above to jump to syscall_trapret.
  • syscall_trapret will resume the "fake" user registers and finally use sysret to start user-space execution.
  • starts to run from initcode.S in user mode.

If "returning from a function" confuses you: the ret instruction is just a special jmp (jump) instruction.

pseudo code of what ret does:

  • rip = *rsp
  • rsp += sizeof(rip)

By storing addresses of different functions on stack and execute ret with %rsp pointing to the left-most one, multiple functions will be called one after another, from left to right.

Some ABI uses stack to pass arguments to functions so on these systems arguments can also be prepared on the stack for the chained calling.

This feature can be used to facilitate buffer-overflow attack: Return-to-libc attack Return-oriented programming