data alignment, packed/padded structure, and flexible array member (array of length zero)

1. why do we need to align data storage, what is padding and packed data?

First, we need to define "alignment". As per C99 3.2

quote:

"Alignment: requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address."

What does the above mean? It means that for an “int” variable, the address of that variable needs to have a value of a multiple of the size of an “int”. For a “double”, the address of that variable needs to have a value of multiple of the size of “double”.

E.g, assume size of an “int” is 4 bytes, for “int a”, address of “a” should be 0x0, or 0x4, or 0x08, or 0x0c, or 0x10.

So why do we need alignment at all? why do we need to store different type of data at particular boundary? It is mainly a requirement from ABI (application binary interface) for a specific CPU hardware. A specific hardware may demand such requirement due to the following reasons:

1. Efficiency. It is simply more efficient for a 64 bits CPU to operate on 8 bytes boundary since all internal registers and data bus are 64 bits. The cpu clock cycle cost of reading and writing 8 bytes to memory is the same as that of reading and writing 1 bytes or 2 bytes, or 3, 4, 5, 6, 7 bytes.

2. Hardware limitation. Due to limitations on hardware memory layout with respect to CPU ( how many banks and length of each bank, etc. ). The address on the address bus must be a multiple of a byte address. Otherwise, we may or may not, depending on hardware, hit SIGBUS. Note that x86 architecture is a particular exception here. On x86, you can have a pointer point to an odd address and assign an integer to that address and it will work. ( int *c=0x8001, *c = 0x12345678, on intel CPU, the storage of 0x12345678 will start from address 0x8001 to 0x8004, without SIGBUS occurring, on PowerPC, you will get SIGBUS ).

Alignment is a compile time determination. During compilation, compiler will allocate extra space between two elements of a "struct, enum, union" to achieve the alignment requirements of the two elements. The extra storage is called padding, and their value cannot be initialized ( well, you can hack but C99 deems that as an "undefined behavior"), and their values are undefined ( as per C99 ) and using their values is an "undefined behavior":

as per C99 section 6.2.6.1 paragraph 6,

quote:

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

If a compiler decides not to honor any alignment requirement for a "struct, enum, union", only allocate minimum storage for the data structure, such data structure is called "packed" data structure. A packed data structure save storage space but needs careful consideration during processing, especially when pointer, or inter process communication involved with heterogeneous CPUs ( Power PC and Intel involved ).

Note that other than hardware requirement and limitation, by default, it it up to compiler to decide the alignment. Compiler can simply put up more restrictions on the boundary. E.g, demanding that "int" at 8 bytes boundary, instead of 4 bytes, even sizeof(int) is 4.

Some compilers ( notably gcc ) offer freedoms to the user to enforce alignment at certain value other than default, or remove alignment completely ( packing ), with compilation attributes. Following are two examples:

struct S { short a, char b; } __attribute__ ((aligned (8)));

struct S { short a, char b; } __attribute__ ((aligned (16), packed))

One interesting question about alignment is that, assume the size of a structure is 128 bytes, should the starting address of such structure be always 128 bytes aligned? The answer is no. the alignment requirements are only constrained by whatever ABI requires, and those are typically CPU width (32 bits or 64 bits maximal).

2. alignment and malloc

As per C99 section 7.20.3:

The order and contiguity of storage allocated by successive calls to the calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated

Therefore, you can malloc() any size you want and cast into any type you want. It is guaranteed to be aligned. One can guess that malloc simply return an address that is aligned to the maximal allowed or required by underneath hardware (32 or 64 bytes aligned typically).

3. why do we need to pad a structure?

So how the alignment related to structure padding? Structure padding is needed to satisfy the alignment requirement.

As per C99, 6.7.2.1

quote:

“Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type.”

“There may be unnamed padding within a structure object, but not at its beginning.”

“There may be unnamed padding at the end of a structure or union.”

Above C99 quotes suggest that for a structure, the starting address has to be aligned, and no padding at the starting of the structure, but there may be padding in between elements (called internal padding) as well as at the end of the structure (called trailing padding).

Let’s assume that sizeof(int) is 4, sizeof(double) is 8 on a 32 bits CPU. Take the following as an example:

typedef struct _a {

char a;

int b;

char c;

} a_t;

a_t *a = NULL;

Here if we directly take the address of &(a->b), the value returned must satisfy the alignment requirements. In the case it has to be 4 bytes aligned, and sizeof(char) is one byte ( demanded by C99 ), then the compiler will put three bytes in between a->a and a->b, the address of a->a is 0, the address of a->b is 0x4. The three bytes padded between a->a and a->b took unspecified value ( meaning any value is possible, as per C99 6.2.6.1 )

One can understand internal padding, but then why do we need trailing padding?

Assume we declare the following:

a_t a[2];

Here the location of a[0] and a[2] must be continuous inside memory (demanded by C99 6.5.6 for pointer arithmetic)

Without the trailing padding for a[0], the address of a[1].b may not necessarily 4 bytes aligned. Hence we need trailing padding too.

Note that the internal padding and trailing padding together, makes the above “a_t” total 12 bytes.

As per C99 section 6.5.3.4

“sizeof(), when applied to an operand that has structure or union type, the result is the total number of bytes in such an object, including internal and trailing padding.”

Therefore, sizeof(a_t) is 12.

For the same reason, the following structure has size of 24.

typedef struct _b {

char a; //7 bytes internal padding after “char a” to make the address of “b” 8 bytes aligned.

double b;

char c; //7 bytes trailing padding after “char c” to make the whole structure 8 bytes aligned.

} b_t;

The rules of padding are:

1. Internal padding is such that the next element of the structure is properly alignment according to the next element's type.

2. Trailing padding is such that the size of the whole structure is a multiple of the single element that has the biggest alignment requirement demanded by either hardware or at compiler's choice.

"biggest alignment" means that if the biggest element inside a structure is a pointer on a 64 bits machine, the pointer size would be 8 bytes and the trailing padding will make sure the whole structure is a multiple of 8 bytes.

E.g:

ypedef struct _b {

int a; //4 bytes internal padding after “char a” to make the address of “b” 8 bytes aligned.

int *b;

char c; //7 bytes trailing padding after “char c” to make the whole structure 8 bytes aligned.

} b_t;

here sizeof(b_t) is 24 bytes, sizeof(int*) is 8 bytes and sizeof(int) is 4 bytes

It is very important to note that internal padding and trailing padding are all optional, in fact, GCC has options to make a structure “packed” without any padding at all (providing the underneath hardware support such unaligned data access)

Theoretically, a compiler can also choose to pad structure array, but not to pad standalone instance of structures.

Padding is a waste of memory. To avoid such waste, one should declare the element in the order of the size descending. ( C99 explicitly prohibit compiler to re-order the elements inside a structure, via offsetof macro )

struct _a { char a; int b; char c; } has size of 12 bytes

struct _a ( int b; char a; char c; ) has size of 8 bytes

4. padding and memcmp

As per C99 section 6.2.6.1

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

Therefore memcmp() may not work when determine if two structure instances are the same. You probably have to compare member by member. But I would question in first place why do we need to do such comparison and if there is a better data structure for the purpose of comparison

5. impact of padding on offsetof() and sizeof()

offsetof() is the offset in bytes of a structure member from the beginning of the structure. sizeof() is the total storage in bytes allocated to the data structure, including any internal and trailing padding bytes. In the case of a string, sizeof() includes the trailing null terminator too since system has to allocation a byte for that as well in the memory

E.g.

typedef struct _s1 {

char a; // 7 bytes internal padding after “char a” to make the address of “b” 8 bytes aligned.

double b; // assume "double" is 8 bytes

char c; // no padding after this element

char d[1]; // 6 bytes trailing padding after “char d[1]” to make the whole structure 8 bytes aligned.

} s1_t;

offsetof(s1_t, d) is 17, sizeof(s1_t) is 24.

sizeof("abcd") is 5 because compiler needs to allocate one more bytes for the null terminator at the end of the string.

Now consider the following structure:

typedef struct _s2 {

char a; // 7 bytes internal padding after “char a” to make the address of “b” 8 bytes aligned.

double b; // assume "double" is 8 bytes

char c; // 7 bytes trailing padding after this element

char d[]; // no trailing padding. "d" should start at 24th bytes.

} s2_t;

Here element “d[]” is called “flexible array member” which occupy no storage space. Note that we can use "[0]" instead of "[]" to indicate a flexible array member, but that is GCC extension, not really part of C99. ( refer https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html )

We may think that storage of "s2_t.d[]" would start right after storage of "s2_t.c". In fact, it is quite opposite.

As per C99 section 6.7.2.1,

“As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. With two exceptions, the flexible array member is ignored. First, the size of the structure shall be equal to the offset of the last element of an otherwise identical structure that replaces the flexible array member with an array of unspecified length. Second, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it."

To explain the above text, let's take the above s2_t example.

The first exception basically states that the sizeof(struct s1_t) should be equal to the offsetof(struct s2_t, d), which means, the "last element, which is the flexible array member", will not participate the trailing padding of the structure. In fact, trailing padding need to be happening “before” the flexible array member and the "flexible array member" storage will happen right after the trailing padding bytes "as if it does not exist". ( of course compiler will not assign storage for the "flexible array member" since it is an incomplete type.

take another example:

struct s1 { int a; char b; char c[]; };

struct s2 { int a; char b; char c[123]; };

struct s3 { int a; char b; char c[1]; };

Here sizeof(struct s1) should equal to offsetof(struct s2, c), which means there should be 3 bytes padding between “b” and “c” for "struct s1".

and inevitably, sizeof(struct s1 ) should be equal to offsetof(struct s1, c) as well. This is a sharp contrast compared to sizeof(struct s3) ( 8 bytes ) and offsetof( struct s3, c ) ( 5 bytes ).

C99 clearly made an effort to make sure the storage of the flexible array member should happen only "after" the storage of the whole structure as if flexible array does not exist.

Take another example:

struct s1 { int a; char b; char c[]; };

struct s1 a;

a.a = 1;

a.b = 2;

a.c[0] = 3; // as compiler assign no storage for flexible array member, this may cause SIGSEGV, but assume not for now.

struct s1 b = a;

Here b.c[0] would not have the value of 3, since it is out of the total size of “struct s1”.

The second exception in the above C99 quotation basically states that if we want to access the storage of the flexible array member, the behavior only go as far as the trailing padding concern, any access beyond the trailing padding bytes are not defined.

now the interesting part between offsetof() and sizeof().

GCC disagrees with the above interpretation on flexible array member and in fact make the flexible array member storage "before" the trailing padding happens, which literally make sizeof(struct s1 ) different from offsetof(struct s1, c). In fact, it made "struct s1" behave like "struct s3".

#include <stdlib.h>

#include <stdio.h>

#include <inttypes.h>

typedef struct {

char a;

int b;

char c;

char d[];

} test_x;

int main(void)

{

test_x x = {0};

return 0;

}

gcc -std=c11 1.c -g

(gdb) p sizeof(x)

$1 = 9

(gdb) p sizeof(test_x)

$2 = 12

(gdb) p &x

$3 = (test_x *) 0x7fffffffdcc0

(gdb) ptype x

type = struct {

char a;

int b;

char c;

char d[];

}

(gdb) p &x.d

$4 = (char (*)[]) 0x7fffffffdcc9

(gdb) p &x.c

$5 = 0x7fffffffdcc8 ""

refer the following link for more details on this big discrepancy:

http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n987.htm

http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n983.htm

GCC hence had asked C99 to change which never happened in C11.

6. why do we have flexible array member?

An argument for the concept of flexible array member is documented in "C99 Rationale page 73". The intention is to have a facility to append payload data right after the storage of “struct s1” such that the value of payload is not interfering with the member of the original structure in terms of coding and communicate data from one process to another process.

let's take a look at the following example:

struct s1 { int a; char b; char c[]; };

A typical use of the above structure is:

struct *s1_and_payload = malloc( sizeof ( struct s1 ) + 155 ); // 155 bytes payload.

s1_and_payload->c[0] = 'a'; // first byte of the payload.

Here the storage of the payload would be right after the "sizeof( struct s1 )", or "offsetof(struct s1, c)".

If we send the above data to another process, on the receiver side, code can look like:

char *payload = data + offsetof(struct s1, c);

char *payload = data + sizeof(struct s1); // sizeof() or offsetof() are the same here.

above examples show the usage of flexible array member to communicate or carry payload whose size cannot known in advance.

However, the difference between sizeof(struct s1) and offsetof(struct s1, c) by GCC suggests that whenever we use flexible array member, we must carefully check the compiler padding and understand the exact staring location of the flexible array before we decide to use sizeof() or offsetof().

Even on a system where offsetof() and sizeof() is conforming to C99, things can easily go wrong as well.

Consider the following use case where we want to send integers as payload from one process to another process. We need to define a header and a payload data structure. The total amount of data transferred should be “header + payload”.

on sending side, we have:

struct payload { int payload; } ;

struct header1 { int total_payload; char dummy; char payload[]; };

struct header2 { int total_payload, char dummy, struct payload[]; };

on receiving side:

struct header1 *header = 0x0; // assume address 0x0 will work

struct payload p = (struct payload *)( &header->payload[0] );

int a = p->payload;

the above code will trigger a SIGBUS as value of “p->payload” is not 4 bytes aligned. If the address of header is 0x0, the address of header->payload would be 0x6 as per GCC.

Therefore we should use “header2” declaration, which will guarantee the starting address of the flexible array member is aligned according to the size of the flexible array member.

struct header2 *header = 0x0;

struct payload p = (struct payload *)( &header->payload[0] );

int a = p->payload; // no SIGBUG as value of p would be 0x8, instead of 0x6.

Things would be more complicated if the payload is communicated between two processes that are running on two heterogeneous CPUs that have different alignment requirement ( e.g, sender is a 32 bits CPU, and receiver is a 64 bits CPU ). In such cases, TLV ( type, length and value ) is clearly a good choice to make the code portable across compiler and hardware.