Process
Overview
processes (state, creation, termination)
(logical) layout of the program image
memory allocation/deallocation (malloc family of memory allocation functions)
Process
a process is a program in execution
executable module of program resides in a program image in main memory
each process id (do a ps in UNIX) and a process state (ready, running, blocked, and so on)
Logical layout of program image
(modified version of [USP] Fig. 2.1, p. 24; image courtesy Robbins & Robbins' [USP] Chapter 2 webpage)
Activation record
(modified version of [USP] Fig. 1.1, p. 16)
A stack of these activation records at run-time is called the execution stack
Process control block
For the duration of its existence, each process is represented by a process control block (PCB) which contains information such as:
state of the process (ready, running, blocked)
register values
priorities and scheduling information
memory management information
accounting information such as open files
allocated resources
Specifics of a PCB depend on the system and techniques used.
(ref. [OSIDP] Fig. 3.1 on p. 110; image courtesy [OSIDP] webpage)
Process life cycle
A process moves through various states during its life:
new,
ready,
running,
blocked, and
done states, and
transitions between them
(regenerated from [USP] Fig. 3.1, p. 62)
Many processes may be ready or waiting at the same time, but only one can be running at a time in a uniprocessor system.
Process and process lifecycle nomenclature
batch process: a process which is not connected to a terminal, or a non-interactive process
resident monitor: first software support for an operating system; transferred CPU control from process to process; provided automatic process sequencing
multiprogramming: multiple processes are kept in main memory and contend for the CPU with the goal of reducing the time the processor is idle
timesharing (or preemptive multi-tasking): an extension of multiprogramming in which the CPU is switched between processes very quickly; a small amount of CPU time is given to each process (called a time slice or quantum); gives rise to interactive systems and enables communication between a user and a process
non-preemptive multi-tasking = multiprogramming without timesharing
job scheduling: policy by which processes are loaded into main memory or, in other words, the ready queue
ready queue: contains all processes in main memory ready to execute; a process is dispatched from the ready queue to the CPU (called process scheduling); its processing may cause it to put on a device queue
process scheduling (or CPU scheduling): policy by which processes are granted access to the CPU
context switch: each process change requires a context switch which is purely overhead (i.e., no useful work is being accomplished); switching the CPU to another process requires i) saving the state of the old process and ii) loading the state for the new process
context switch time: time required to perform a context switch
system call: call to a core OS service, such as a read or write, and typically triggers a context switch;
`a system call is a request to the operating system for a service' ([USP] p. 119)
or
`a system call is a request to the operating system for service that causes the normal CPU cycle to be interrupted and control to be given to the operating system' ([USP] p. 7)
a system call, unlike a library call, often involves a context-switch; library calls are usually jackets for system calls (see [USP] exercise 4.19, p. 119 for more information).
interrupt (hardware): a hardware notification of an event, such as completion of I/O (e.g., read or write), error condition (e.g., divide by zero), request for an OS service (e.g., in response to a system call), typically generated by hardware; once loaded, an OS waits for an event to occur; events are signaled by interrupts; therefore, an OS is event-driven (or interrupt-driven) system
interrupt service routine: when an interrupt is generated, control is transferred to an interrupt service routine, which deals with the particular situation; after the interrupt is processed, control is returned to the interrupted process
(asynchronous or synchronous) signal (software): a software notification of an event
asynchronous event: any event which can occur at any time; interrupts and signals are asynchronous events in that there is no way to predict when they will occur
synchronous event
device driver: a program which identifies with and deals with a device (usually sends a signal in response to an interrupt)
UNIX is a multiprogramming and timeshared OS.
System calls cause context switches
(ref. [OSIDP] Fig. 1.5 on p. 16; image courtesy [OSIDP] webpage)
Process scheduling
allocating the CPU to a different process to reduce idle time
each process change requires a context switch
a context switch is pure overhead (i.e., involves no useful work)
Process identification
getuid, getgid, geteuid, getegid
process id, parent process id
getpid and getppid
real vs. effective user and group ids
cast results of these functions to long
output of ps command provides many of these details, including process state; experiment with the -a, -A, -l, and -o options
top
Process creation
an initial OS process begins running at boot time, which spawns (creates) other processes
a process hierarchy (tree) evolves with parent-child relationships
fork
system call used for process creation in UNIX
caller becomes parent and newly created process is child
processes are the same except for the process id
child inherits many attributes of its parent (e.g., environment, privileges, open files and devices)
returns 0 to the child and the child pid to the parent
execution proceeds after the call to fork in each process image
parent code and child code
effect of concurrency
sleep function, takes seconds to block as an int (e.g., sleep(30))
Graphical depiction of fork
(ref. [ATT] 6-11)
Process termination
returned to OS (in shell variable $? in UNIX system)
why 0 for success?
does not seem to follow C boolean convention
many ways to fail; see bottom of manpage for meanings of particular exit status
echo is the UNIX print command
for instance, echo hello world, echo $?
exit vs. return
in main they are the same
outside of main
exit can be used to exit the program (from any function)
return will transfer control to the caller
absence of return or exit in main has the same effect as exit(0)
to install an exit handler use atexit
how can a process terminate abnormally? external event (e.g., a <ctrl-c> (an asynchronous event)) or an internal error (e.g., illegal memory access (a synchronous event))
NULL pointer
it is illegal to read or write to address 0
a null pointer
any pointer with value 0
returned by pointer-returning functions to signify failure
NULL is defined in stdio.h as
#define NULL 0, or
#define NULL (void*) 0 in the new ANSI standard
not to be confused with the null character \0
Memory allocation/deallocation
malloc, calloc, realloc, and free
a general-purpose memory allocation package
prototyped in stdlib.h
malloc family allocates storage from the heap
malloc returns a pointer to a block of at least size bytes suitably aligned for any use
malloc returns a pointer of type void*; cast to appropriate type
to declare and array of 10 ints: int* int_ptr = (int*) malloc (sizeof(int)*10);
PCB* process_str_ptr = (PCB*) malloc (sizeof(PCB));
the argument to free is a pointer to a block previously allocated by malloc
after free is executed, this space is made available for further allocation by the application, though not returned to the system; memory is returned to the system only upon termination of the process (there is more to the story than this)
when passed NULL, recalloc acts like malloc
when passed a non-NULL pointer and a size of 0, recalloc acts like free
References
Threads
(ref. [OSIDP] Fig. 4.1 on p. 162; image courtesy [OSIDP] webpage)
Introduction
A thread is
an ADT within a process
has its own stack, program counter value, register set, and state
share process resources
(heavyweight) process vs. (lightweight) thread
We also have a thread control block (TCB).
Relationship between process and thread states
(ref. [OSIDP] Fig. 4.7 on p. 170; image courtesy [OSIDP] webpage)
variables are either static (exist for the life of the process) or automatic (allocated/deallocated on block entry/exit)
Declarations and definitions
defining a variable entails allocating storage for it (once per program)
declaring a variable entails announcing its type so that it can be used
every definition is also a declaration (e.g., the definition int x; is also a declaration)
some declarations are not definitions (e.g., the declaration extern int x; is not a definition)
extern modifier in C
/* x.c */ int x = 10; #include<stdio.h> /* main.c */ extern int x; main() { printf("%d\n", x); }
Storage and linkage classes
storage classes
static storage: once allocated, persists throughout lifetime of program (e.g., a global variable)
automatic storage: allocated and deallocated on block/function entry and exit, resp. (e.g., a variable local to a function)
linkage classes
internal linkage: visible only within defining file
external linkage: visible outside of defining file as well as within it
statics and externals have 0 as a default value
automatics have no default value
Storage class summary
notes:
speed advantage
for function arguments, value passed
can be accessed from other files
cannot be accessed from other files
(ref. [C])
static modifier in C
can you write an object-oriented program in C?
before a variable within a function
if present, static storage
if absent, automatic storage (default)
before a variable outside of a function
if present, internal linkage (like private in OO languages)
if absent, external linkage (default)
before a function
if present, internal linkage (like private in OO languages)
if absent, external linkage (default)
static modifier summary
(courtesy Robbins & Robbins' [USP] Chapter 2 webpage)
Summary of static reserved word
static keyword used in a variable declaration:
outside of any function:
/* x is static data and allocated in the static region of the memory image, and it has external linkage */ /* linkage class: ? storage class: ? */ int x; /* x is STILL static data, but now has internal linkage and thus cannot be referenced by another module (.o file) */ /* akin to "private" in C++ or Java */ /* linkage class: ? storage class: ? */ static int x;
inside of any function:
void f() { /* x is allocated on the stack (i.e., it is not static data) and this particular x can only be referenced within the body of this function */ /* linkage class: ? storage class: ? */ int x; /* x is now static data and allocated in the static region of the memory image */ /* linkage class: ? storage class: ? */ static int x; }
static keyword used in a function definition/declaration:
/* f() has external linkage and thus can be referenced by another module (i.e., .o file) */ /* linkage class: ? storage class: ? */ void f(); /* f() has internal linkage and thus cannot be referenced by another module (i.e., .o file) */ /* linkage class: ? storage class: ? */ void static f();
Bubblesort storage class exercise
for each object and function in the following code, give the storage and linkage class where appropriate
ref. [USP] program 2.5 (p. 41) and exercise 2.20 (p. 42)
/* a function which sorts an array of integers and counts the number of interchanges made in the process */ static int count = 0; /* return true if interchanges are made */ static int onepass (int a[], int n) { int i; int interchanges = 0; int temp; for (i = 0; i < n-1; i++) if (a[i] > a[i+1]) { temp = a[i]; a[i] = a[i+1]; a[i+1] = temp; interchanges = 1; count++; } return interchanges; } void clearcount (void) { count = 0; } int getcount (void) { return count; } /* sort a in ascending order */ void bubblesort (int a[], int n) { int i; for (i = 0; i < n-1; i++) if (!onepass (a, n-i)) break; }
count is a static variable with internal linkage (note use of static modifier for linkage, not storage)
all other variables use automatic storage with no linkage
onepass does not have a storage class; its linkage is internal
all other functions have external linkage; functions do not have a storage class
Synchronization
Thread-safe functions
static variables make the use of multiple threads unsafe
char* strtok (char* restrict s1, const char* restrict delimiter);
first and subsequent calls are different
tokenizes string in place (i.e., does not allocate new space for tokens)
strtok is not thread-safe because it uses a static pointer
graphical depiction of string before and after call to strtok
before:
(regenerated with minor modifications from [USP] Fig. 2.3, p. 36)
after:
(regenerated with minor modifications from [USP] Fig. 2.4, p. 36)
index into the string being tokenized (e.g., wordcount, wordaverage)
use POSIX thread-safe function strtok_r (_r stands for reentrant)
moral of the story: avoid using static storage
makeargv
ref. [USP] §2.6 (pp. 31-38)
three versions
char** makeargv (char* s);
int makeargv (char* s, char*** argvp);
int makeargv (const char* s, const char* delimiters, char*** argvp);
for instance, ./a.out -c tom and -m jerry
importance of cleaning up after yourself; [USP] example 2.19 on p. 38 (freemakeargv.c)
[USP] exercise 2.24 on p. 51 (getpaths.c)
Self-study
re-read [USP] §2.9 (pp. 42-48)
download, compile, and run codes
convince yourself you understand how the list works
exercise in implementing the third version of makeargv