3.x Fundamentals Of Software Engineering For Games

What?

Janitors and RAII:

Resource Acquisition Is Initialization (RAII) is a design pattern where the acquisition and release of a resource are bound to the constructor and destructor of a class.

C++ Standardization: C++11 added lots of features such as

A type-safe nullptr literal to replace C's bug-prone macro "NULL".
The auto and decltype for type inference.
The constexpr for defining compile-time const vals.

3.3 Data, Code and Memory Layout

3.3.2 Primitive Data Types:

char which is usually 8 bits, some compilers define char to be signed while others use unsigned by default.
int, short, long: int usually defined to be 32 bits or 64 bits according to the CPU architecture
float: defined to be 32 bits on most modern compilers
double: is a double-precision floating-point value 64 bits
bool: a true/false value. The size of a bool varies widely across diff compilers and hardware architecture. It is never implemented as a single bit, some compilers define it to be 8 bits while others use a full 32 bits.
Portable Sized Types: CPU is usually faster when working in single-precision floating-point math.

3.3.2.1 Multibyte Values and Endianness:

Values that are larger than 8 bits wide (1 byte) are called multibyte quantities
Multibyte integers can be stored in memory in one of two ways:
- Little-endian: If a microprocessor stores the LSB at a lower mem address than the MSB.
- Big-endian: If a microprocessor stores the MSB at a lower mem address than the LSB.

Endianness could be a problem in game dev because of the difference between the dev machine and the target machine such as consoles (big-endian by default), an example would be writing a data file in little-endian and when the game runs on a big-endian machine it would read the data in reverse! which ofc is not what we need. A solution is to use some extra disk space ineffectively and store all multibyte numbers as a sequence if decimal or hex digits one bytye per digit. A better solution is endian-swap, thus making sure that the endianness is the same.
Endian-Swapping: for integers is kinda straightforward but with floats it's tricky because of the nature of a floating-point value and its internal structure (mantissa, exponent and sign bit). Can use type punning but it's bad when strict aliasing is enabled. Using union is a solution that is guaranteed to be portable.

3.3.3 Kilo VS Kibi

Metric SI versus IEC which is more precise as it's a power of 2 instead of 10.

3.3.4 Declarations, Definitions and Linkage

3.3.4.1 Translation Units Revisited:

The compiler translates one .cpp file at a time and for each one it generates a .o or .obj file which may contain unresolved refs to func/vars defined in other cpp files and it's the linker's job to combine all obj files into one exe by attempting to resolve all of the unresolved cross-refs between them (objs).

Linker Errors:

Only 2 kinds (unresolved external symbol and multiply defined symbol), the first one happens when the target of an extern ref is not found, the second one happens when the linker finds more than one var/func with the same name.

3.3.4.2 Declaration VS Definition:

Declaration is a ref to an entity while Definition is the entity itself. In more details declaration describes the data object or func as it provides the compiler with the name of the entity and it data type or func signature while Definition describes a unique region of memory in the program.

Multiplicity of Declaration and Definition:

Any var/func can have multiple declarations but each can have only one definition. If two or more identical definitions exist in a single translation unit, the compiler will notice and flag an error. If two or more identical definitions exist in different translation units, the compiler will not complain because it operates on one translation unit at a time. However, the linker will give us a "multiply defined symbol" error.

Definition in Header Files and Inlining:

Generating a "multiply defined symbol" 101: just place your definitions in the header file ezz. Away from the joke, it's dangerous to do that because if that header file is included in more than one .cpp file ,which happens a lot, it's guranteed to have that linker error as stated above. But an exception to this rule is the inline func because each invocation of an inline func is a brand new copy of its code embedded directly into the calling func. In fact inline funcs must be defined in header files.

3.3.4.3 Linkage:

After compilation is done, there is no such thing as variable scope.

By default, definitions have external linkage. The static keywork is used to change a definition's linkage to internal which means it will only be seen in the translation unit in which it was declared in.

3.3.5 Memory Layout Of a C/C++ Program:

3.3.5.1 Executable Image:

When a c/c++ program is built, the linker creates an executable file. I've seen 2 types so far, the widely known .exe under a Windows machine and the .elf file format (executable and linking format) which is popular under UNIX-like OS including game consoles.
The executable image is divided into contiguous blocks called segments or sections.
The image is usually comprised of at least those 4 segments:
- Text Segment: AKA code segment, contains executable machine code for all funcs.
- Data Segment: contains all initialized global and static vars.
- BSS Segment: (Block Started by Symbol) contains all uninitialized global and static vars
- Read-Only Data Segment: AKA rodata contains any read-only data (constants of all types) but integer constants can be an exception because they are often used as manifest constants by the compiler, meaning that they are inserted directly into the machine code thus occupying storage in the text segment rather than the rodata segment.
Global vars defined at file scope are stored in either the data or BSS segments depending on whether or not they have been initialized.

3.3.5.2 Program Stack:

When an executable is loaded and run the OS reserves an area of memory for the program stack. Whenever a function is called a stack frame is pushed onto the stack. A stack frame is a contiguous area of stack memory. A stack frame stores 3 kinds of data:

The return address of the calling func.
The contents of CPU registers.
All local vars AKA automatic variables.

The stack pointer is responsible of pushing and popping stack frames.

3.3.6.1 Clss-Static Members:

The static keyword has different meanings depending on context:

At file scope = internal linkage.
At func scope = global var not automatic and can only be seen inside this func.
Inside a class or struct it means that this var is not a regular member var but acts as global.

3.3.7 Object Layout in Memory:

3.3.7.1 Alignment and Packing:

Each data type has an alignment which must be respected in order to permit the CPU to read and write memory in an effective way.
The alignment of a data object refers to whether its adderss in memory is a multiple of its size.
The compiler will leave holes in memory layout but some compilers can be requested not to leave these holes by using the pre-processor directive #pragma pack. A manual data member rearrangement can make things better by eliminating some of the padding that the compiler does. The best way to pack and allign is to add explicit padding manually.

3.3.7.2 Memory Layout of C++ Classes:

When class B inherits from class A, B's data members are stored immediately after A's members in memory and each new derived class simply tacks its data members on at the end.
If a class inherits one or more virtual functions, then additional 4 or 8 bytes (according to the target hardware) are added at the beginning to the class's layout in memory, that is the virtual table pointer or vpointer which is a ptr to the virtual function table or vtable
The vtable contains pointers to all the virtual functions that is declared or inherited for a particular class. Each class has its own vtable and every instance of that class has a pointer to the vtable stored in its vpointer.

3.4 Computer Hardware Fundamentals

3.4.2 Anatomy of a Computer

The von Neumann architecture: CPU, bank of memory connected on a motherboard via buses and connected to external peripheral devices by means of IO Ports and/or expansion slots.

3.4.3 CPU

The brains of the computer, consists of:
- Arithmetic/logic unit (ALU): for integer arithmetic and bit shifting (unary and binary).
- Floating-point unit (FPU): for floating-point arithmetic
- Vector processing unit (VPU): capable of performing integer and floating-point operations on multiple data items in parallel (combination ALU/FPU). AKA SIMD single instruction multiple data. Today's CPUs don't contain an FPU. Instead, all float ops are performed by the VPU.
- Memory Controller (MC) or Memory Management Unit (MMU): an interface with on-chip and off-chip memory devices.
- register: high speed memory cells for temp storage during calc.
- Control Unit (CU): the brains of the CPU, used for decoding and dispatching machine language instructions.

3.4.4 Clock

Every digital electronic circuit is essentially a state machine. And its states can be changed by a digital signal which might be provided by changing the voltage on a line in that circuit.
State changes within a CPU are typically driven by a periodic square wave signal known as the system clock. Each rising or falling edge of this signal is known as a clock cycle and the CPU can perform at least one op on every cycle thus the time appears to be quantized to a CPU.

3.4.5 Memory

Acts like a bank of cells, with each cell containing a single byte of data
Memory comes in two basic flavors:
- read-only memory (ROM): retains data even no power is applied. Some can be programmed once, others can be reprogrammed over and over again AKA electronically erasable programmable ROM (EEPROM) such as flash drives.
- read/write memory or random access memory (RAM): can be further divided into SRAM, DRAM. Both retain their data as long as power is applied to them but DRAM also needs to be refreshed periodically by reading the data then re-writing it. This is because DRAM cells are built from MOS capacitors that gradually lose their charge.

3.4.6 Buses

A typical PC contains 2 buses, An address bus and a data bus.
The CPU loads data from memory (MC reads an address from the address bus and puts the data into the data bus where it will be seen by the CPU). Same method is used to write data.
The Bus Width: the width of the address bus controls the size of addressable memory in the machine, while the width of the data bus controls how much data can be transferred between CPU registers and memory
Words: Word is used to describe a multi-byte value, but it varies and depends on the "context" in which it's defined, for example in the Windows API, word refers to the smalles multi-byte value which is 2 bytes (16 bits), thus making a double word be 4 bytes (32 bits) and a quad word 8 bytes (64 bits). In another context for a particular machine a word could be used to refer to the natural size of data, like, a machine that has 32-bit registers and a 32-bit data bus would be a machine with a word size of 32 bits.

3.4.7 Machine and Assembly Language

Instruction Set Architecture (ISA): the set of instructions supported by a CPU. Instructions that are common to pretty much every ISA:
- Move: to move data between registers or between memory and registers.
- Arithmetic operations: ++, --, *, /, sqrt.
- Bitwise operations: AND, OR, (XOR || EOR), bitwise complement.
- Shift/rotate operations: shift bits within a data word.
- Comparison: compare 2 values and set the status flag.
- Jump and branch: change program flow by storing a new address into the ip.
- Push and pop: onto/from program stack.
- Function call and return: some ISA provide separate instructions for func call/return but it can also be done by using push/pop and jump instructions.
- Interrupts: triggers a signal into the CPU to notify the OS of events.
Machine Language:
- A ML instruction is comprised of 3 basic parts:
  - opcode: which tells the cpu wich operation to perform.
  - operands: I/O for the instruction.
  - options field: flags, addressing mode...

Addressing Mode(of an instruction): the way in which an instruction's operands are interpreted and used by the CPU.

Assembly Language: well u know it...
Addressing Modes:
- Register Addressing: This mode allows values to be transferred from one register to another.
- Immediate Addressing: This mode allows literal value to be loaded into a register.
- Direct Addressing: This mode allows data to be moved to or from memory.
- Register Indirect Addressing: In this mode, the target memory address is taken from a register rather than being encoded as a literal value in the instruction and this is how pointer dereferencing is implemented in C/C++.
- Relative Addressing: In this mode, the target memory address is specified as an operand and the value stored in a specified register is used as an offset from that target memory address and this is how indexed array is implemented in C/C++.

3.5 Memory Architectures

3.5.1 Memory Mapping

Whenever a physical memory device is assigned to a range of addresses in a computer's address space, we say that the address range has been mapped to the memory device.

Memory Mapped I/O:
- An address range might also be mapped to other peripheral devices such a joypad or NIC allowing the CPU to perform I/O operations on a peripheral device by reading to writing to addresses just as if they were ordinary RAM. Under the hood there is a special circuitry that detects that the CPU is reading from or writing to a range of addresses that have been mapped to a non-memory device and converts the read/write request into an I/O operation. (port mapping is kinda same)

Video Ram (VRAM):
- It's a range of memory addresses assigned for use by a video controller.
- In Game Consoles (PS4 & Xbox), CPU and GPU have a shared access to a single large unified block of memory.

3.5.2 Virtual Memory

A memory remapping feature where the memory addresses used by a program don't map directly to the memory modules installed in the computer.

Virtual memory pages:

The entire addressable memory space is conceptually divided into equally-sized contiguous chunks called pages.

Virtual to Physical Address Translation:
- Whenever the CPU detects a memory read/write operation, the address is split into two parts: the page index and an offset within that page. The index is then looked up by MMU in a page table that maps virtual page indices to physical ones.
- Page Fault: If the page table indicates that a page is not mapped to a physical address, the MMU raises an interrupt which tells the OS that the memory request can't be fulfilled.
- The OS handles page faults by crashing the program.

3.5.4 Memory Cache Hierarchies

A mechanism to mitigate the impacts of high memory access latency in PCs and consoles.

L1 Cache: small but fast bank of RAM, placed very near to the CPU (on the same die).
L2 Cache: larger but somewhat slower, placed further away from the core than L1 cache.
Cache hit/miss: data requested by CPU is in cache/ not in cache
Cache Lines:
- Locality Of Reference:
  - Spatial locality: if a memory address is accessed it's likely that nearby addresses will also be accessed (iterating over an array).
  - Temporal locality: if a memory address is accessed it's likely that the same address will be accessed again in the near future.

So What?

That's some chapter,,, a great deal of information and a perfect refresher.

Now What?

Concurrency & Parallelism :) -> chapter 4

Google Sites

Report abuse