array, pointer and function pointer

1. differences between pointer, array, and address of array.

let's have a quick example: what’s the difference between the following declarations?

char *a_ptr = “abcdef”;
char a_array[] = “abcdef”;

The above two declarations achieve three things:

1. creating a storage for variable “a_ptr” on read/write .data section of the final binary file. Assume a pointer are 4 bytes, this storage will cost 4 bytes on .data section.

2. creating a storage of “sizeof(“abcdef”)”, total 7 bytes (including the null terminator for strings ) on .data section. This 7 bytes storage is for variable “a_array”.

3. creating a storage for string “abcdef” on .ro section of the final binary file. This section is ready only. Total 6 bytes.

When the binary loaded into the memory, loader will do two things:

1. initialize variable “a_ptr” such that it points to the address of the .ro section where constant string literal “abcdef” stored. Naturally, if later we do a_ptr[1]= ‘g’, it will hit memory fault as we cannot write to .ro section. Hence the following code is wrong:

char *a_ptr = “abcdef”;
a_ptr[0] = ‘z’;   // program will crash here due to SIGSEGV as we are trying to modify read only memory

2. set the storage of a_array ( remember it got 7 bytes in .data section for itself ) to the value of "abcdef", including the nulll terminator. Now, you can do a_array[0] = 'z'. You are just change the storage content from "abcdef" to "zbcdef".

You can "objdump" the final binary executable to find the corresponding sections to see the above discussion is correct.

here is a dump below. You can clearly see that "a_ptr" occupy 8 bytes here ( I am on a 64 bits machine ) since it is a pointer. &a_ptr = 0x600b48. Its content or value would be an address where a "char" is stored.

char *a_ptr;
char a_array[] = “abcdef”;
   0x600b34 <a_array.2776>:     movslq 0x64(%rbp,%riz,2),%esp
   0x600b38 <a_array.2776+4>:   gs
   0x600b39 <a_array.2776+5>:   data16
   0x600b3a <a_array.2776+6>:   add    %al,(%rax)
   0x600b3c:    add    %al,(%rax)
   0x600b3e:    add    %al,(%rax)
   0x600b50 <a_ptr.2777>:       adc    %dl,(%rax)
   0x600b52 <a_ptr.2777+2>:     (bad) 
   0x600b53 <a_ptr.2777+3>:     add    %al,(%rax)
   0x600b55 <a_ptr.2777+5>:     add    %al,(%rax)
   0x600b57 <a_ptr.2777+7>:     add    %al,(%rax)

Now reconsider the following declarations:

char *a_ptr = “abcdef”;
char a_array[] = “abcdef”;
void main(void)
{
    printf(“%p, %p, %p\n”, a_array, &a_array, &a_array[0]);
}

The above will print out the exact same value three times, which is probably expected. Note that "a_array" has no storage of itself. It got a storage of 7 bytes for the string "abcdef" directly on .data section. The value of "a_array" and the address of "a_array" is the same, which is the starting address of the storage "abcedf".

so what’s the difference between “&a_array” and “a_array”?

well, "&" is called "address operator", and "*" is called indirection operator.

In the above declaration, “a_array” is of type “pointer pointing to a char array of unknown size”, therefore you can do:

a_ptr = a_array;

“&a_array” is of type “pointer pointing to a char array of size 7”, therefore you cannot do this:

a_ptr = &a_array;

however, you can do the following:

char (*ptr_of_size7)[7]; // a pointer pointing to an char array of size 7

ptr_of_size7 = &a_array;

Because a compiler assigns storage for pointer and array differently, If we declare pointer or array in one file , and use them in another file, the “extern” must be consistent with the original declaration.

E.g.,

Inside file “1.c”, we have:

// this is file 1.c

char *a_ptr; // storage is 4 bytes on 32 bits machine, content unknown.

char a_array[10] = "abcd"; // 10 bytes storage, content value is 0xabcd.

Inside file “2.c”, we should have:

// this is file 2.c

extern char *a_ptr;

extern char a_array[];

Note that we should not do the following:

// this is file 3.c

extern char *a_array; // this is wrong.

Above file 3.c is wrong declaration. Because the compiler will treat the original a_array storage as a pointer ( inside file 3.c only ) with a value of 0xabcd, which is not an addressable address on many operating systems. and you will hit SIGSEGV as soon as you try to de-reference a_array in file 3.c. E.g., you will hit SIGSEGV if you do a_array[] = 'z', which is trying to set value 'z' at address 0xabcd.

Above are the differences between pointer, array, and address of array.

A final notes on & and * operators:

As per C99, “&a[1]” is equivalent to “a+1”, "&a[0]" is equivalent to "a+0" and “*&” and “&*” will cancel each other. Therefore “&*a” and “&a[0]” and "a" are equivalent, even when “a” has value “null”,

likewise, “a” and “*&a” are equivalent too, even “a” has value “null”

The following code will print out “nil,nil,nil,nil”:

char *a = NULL;

printf(“%p , %p, %p, %p\n”, &*a, *&a, a, &a[0]);

The following code will print out "nil, nill, nil, 0x1":

char *a = NULL;

printf(“%p , %p, %p, %p\n”, &*a, *&a, a, &a[1]);

2. function pointer

Before we discuss function pointers, we need to further clarify on "&" and "*" operators:

As per C99 6.5.3.2:

1 The operand of the unary & operator shall be either a function designator, the result of a[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.

2 The operand of the unary * operator shall have pointer type.

3 The unary & operator returns the address of its operand. If the operand has type ‘‘type’’, the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.

4 The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.83)

Before we discusss function pointer, we need to clarify the definition of "function designator".

As per C99 6.3.2.1,

A function designator is an expression that has function type. Except when it is the operand of the sizeof operator or the unary & operator, a function designator with type ‘‘function returning type’’ is converted to an expression that has type ‘‘pointer to function returning type’’.

So for the following declaration:

int function_a(int a);

here identifier “function_a” is a “function designator”, it is a function pointer pointing to the function also. ( remeber C99 quote "…is converted to an expression that has type “pointer to function returning type..." ).

also, “&function_a” is equivalent to “function_a”. (remember C99 quote "....the unary & operator….result is a pointer to the function designated by its operand...." )

also, “*function_a” is equivalent to “function_a”. ( remember C99 quote "...The unary * operator…if the operand points to a function, the result is a function designator)

to quickly summarize the above, all the following are all equivalent:

*&function_a

&*function_a

&function_a

*function_a

**function_a

function_a

However, “&&function_a” is not a valid syntax as “&” requires lvale and “&function_a” ceased to be an lvalue as "&" operator returns the "address of its operand" ( C99 6.5.3.2 section 3 above ).

as the last example, all the following are compilable and syntactically correct:

int a = (*&function_a)(10);

int a = (&*function_a)(10);

int a = (&function_a)(10);

int a = (*function_a)(10);

int a = (**function_a)(10);

int a = (function_a)(10);

int a = function_a(10);