C/C++ data flow analysis rules (Cause of control flow failure in C)

Post date: Jun 10, 2011 10:14:24 AM

The rules are categorized into errors, mistakes, warnings, security, and portability so as to help you decide which to enable.

You should consider portability checks if you are concerned about porting your code from machines of one width (like 32-bit machines) to machines of another width (like 64-bit machines).

In contrast, an error is something we expect everybody to want to see, while warnings very few of you will turn on. Mistakes are somewhere between errors and warnings. Depending on your coding style, they may be symptoms of something serious of not.

Restriction: On Windows®, if you are analyzing files that contain multibyte characters or have multibyte characters in their filename or file path, C/C++ Data Flow Analysis will produce no results.

Errors

Do not use unassigned variables

The value of xyz is used, however, it has not been assigned or it is not initialized. In this case, xyz will evaluate to a random bit string, which is rarely desired.

If the results indicate that the variable has not been assigned, there is not a single assignment to xyz anywhere in the function. If the results indicate that the variable has not been initialized, there exists a feasible path to the usage of xyz, bypassing all assignments (even though there may some assignments to xyz). If so, the path is displayed.

If the results indicate this error, you should not assume that you simply need to initialize the variable. The majority of uninitialized variables are not caused by lack of initialization at the declaration, but rather by some mistake along the identified path.

Example:

int ival; void func() { struct foo { int x; int y; } v1, v2; v1.x = 42; v2 = v1; func2(v1); ival = v1.y; }

It can happen that only a piece of a variable is initialized, and the remainder is not. For example, look at this C example:

int func() { union { int i; short s[2]; } x; x.s[0] = 0; return x.i; }

In this case, the message will be:

uninitialized 'x.i[16:31]'

The notation [16:31] refers to bit positions inside x.i.

Do not perform invalid operations involving NULL pointers

Example 1:

a = &(p->f); b = &(q[2]); c = r + 2;

The C standard considers these invalid in case p, q, or r are NULL pointers.

This complaint is suppressed when you explicitly write

a = &(NULL->f);

because that is a common way of calculating the offset of the field f (typically via the offsetof macro).

Example 2:

i = p->f; j = *p;

If p is NULL, the assignment to j involves dereferencing a NULL pointer, which will be reported as Do not dereference a NULL pointer. Do not perform invalid operations involving NULL pointers is closely related to the dereferencing of a NULL pointer, but it is not the same thing. Consider the above assignment to i, and suppose that the field f is at offset 4. If p is NULL, then p->f does not constitute dereferencing NULL (in other words, address 0) but dereferencing the address 4.

Do not deallocate previously deallocated memory

This complaint is followed by a path through the program. This path ends in a deallocation, such as free(xyz). Somewhere along that path, there is another deallocation of xyz.

Deallocating a memory location more than once can lead to memory corruption and random crashes much later in the program.

Common causes include using two pointers to the same memory instead of making a copy of an object, or falling through an unexpected path of code where a second deallocation is occurring.

Take a look at where the memory was allocated, and see if there is any reason that there is more than one place that the memory can be deallocated. There should be a one-to-one correspondence between allocations and deallocations for each pointer.

Do not access deallocated memory

This complaint is followed by a path through the program. This path ends in an access to a memory location xyz which could be an assignment or use of the value of xyz. However, somewhere along the path, the location xyz was previously deallocated. Once memory is deallocated, its value is undefined, and should not be used.

This is commonly caused by having more than one pointer to a memory location and deallocating the memory through one of them. All of the other pointers will still contain the correct address, but the value at that address cannot be determined and should never be used.

To investigate, ascertain where the memory is being used and why it is possible to get to that point after the memory has been deallocated.

Do not dereference a NULL pointer

This complaint is followed by a path through the program. This path ends in the dereferencing of a pointer, for example *xyz. When dereferenced, xyz will be NULL, which is a problem because dereferencing a NULL pointer is likely to result in unpredictable behavior. There are two ways the path might cause xyz to be considered NULL:

by an assignment, for example
- xyz = NULL;
by a test, for example
- if ( xyz )

Any path that follows the else branch of such a test implies that xyz is NULL.

Common causes include forgetting to check if a pointer is NULL before using it, or moving through a path of code where the pointer is expected to be valid, but is NULL instead.

Do not invoke a method with a NULL class pointer

This complaint is followed by a path through the program. This path ends in an expression of the form xyz->foo(). When dereferenced, xyz will be NULL, which is a problem because calling a method via a NULL class pointer can have unpredictable results, and will not work in many cases. There are two ways the path might cause xyz to be considered NULL:

by an assignment, for example
- xyz = NULL;
or by a test, for example,
- if ( xyz )

Any path that follows the else branch of such a test implies that xyz is NULL.

Common causes include forgetting to check if a pointer is NULL before using it, or moving through a path of code where the pointer is expected to be valid, but is NULL instead.

The difference between this rule and Do not dereference a NULL pointer is that this rule refers to the special case of invoking of a class member foo through a pointer xyz. This may not automatically lead to a failure, provided the method foo handles the possibility that this is NULL. Allowing this to be NULL can lead to unpredictable results (for example, it could fail for virtual functions).

Do not access an array beyond its bounds

This complaint is followed by a path through the program. This path ends in one of:

a dereferenced pointer, for example *xyz
an array that is subscripted, for example A[I] (which is the same as *(A+I))
a called function that accesses a buffer, for example strcpy(A, B)

Reading from memory outside of an array's bounds may result in a random value. Writing into memory outside of an array's bounds may result in memory corruption.

Common causes include not checking a variable's value before using it to index an array, or using a value that is the same as the number of array elements (which is too large to actually index the array with). A common cause of buffer overflow is forgetting to terminate a NULL character.

To investigate the problem, look at the index that is being used. If it is a variable, ensure that it can only contain values that are within the array's bounds. If it is not a variable, ensure the declaration of the array is large enough - and ensure that the index is correct.

C/C++ data flow analysis assumes a memory model where all allocations (on heap as well as stack) are independent of each other. It may issue Do not access an array beyond its bounds if you are assuming allocations are not independent, such as assuming that memory is allocated consecutively.

Example 1:

typedef struct {int a; int b[10]; int c;} abc; void f(int *p) { abc *s = (abc *) malloc(sizeof(abc)); s->b[9] = 0; s->b[10] = 0; s->b[11] = 0; s->b[-1] = 0; s->b[-2] = 0; p[-1] = 0; }

Please note the difference between s->b[10] and s->b[11] in the example. The latter most likely accesses beyond allocated memory, while the former does not -- it is likely to access the field b (depending how the compiler lays out the struct). Both are rarely intended, but s.a[11] has two potential problems, while s.a[10] has only one.

Example 2:

extern struct {int a; int b[1];} *s; s->a[10] = 0;

Avoid passing pointers which point to deallocated memory

This complaint is followed by a path through the program. This path ends in the use of a variable which is a pointer that was previously freed along the path. For example,

free ( xyz ); a = xyz; foo ( xyz ); if (xyz) ... b = (xyz == another_ptr) ? c : d;

Typically, you will not want a dangling pointer to deallocated memory. Using a pointer to deallocated memory can have unpredictable results because it is usually unknown if it will be dereferenced in the future. The only possible exception is comparing the value of a deallocated pointer to some other pointer. In the above example, the if statement compares xyz against NULL and, in the assignment to b, there is a comparison of xyz against another_ptr. In contrast, the assignment to a and the call to foo does not involve any comparison.

Common causes include meaning to use the pointer before it is deallocated, or meaning to set the pointer to NULL or to new memory before it is used again after the deallocation.

Things to look at include where the memory is deallocated, and the path of the code until it is first used again. Most of the time, the pointer should be reassigned to NULL or another memory location before it is used again.

Avoid passing NULL

This complaint is followed by a path through the program. The path ends in a function call such as

f ( xyz );

At this point, xyz will be NULL, which is a problem because f is incapable of handling a NULL pointer.

Passing NULL to certain functions can result in unpredictable behavior.

Common causes include forgetting to check whether a pointer is NULL before passing it to a function, using the wrong function, or assuming that the function allows NULL pointers as arguments.

Look at the argument that is being passed into the function, and look back at its declaration or definition. Decide if you should be checking its value against NULL before using it in the function call.

Avoid passing a structure to a function with a variable number of arguments

The intent of this complaint is to point out something unexpected. Passing an object of structure type to a variable argument list is undefined in C and mostly undefined in C++. So probably something else was intended. Here is an example that C/C++ Data Flow Analysis encountered on several occasions:

class yasc { // Yet Another String Class private: const char *name; size_t len; public: const char *c_str() { return name; } // more stuff };

yasc answer = "42"; printf(".. and the answer is '%s'\n", answer); // should be answer.c_str()

This code works by accident, because the data layout of the class happens to allocate name at offset zero. It will break in mysterious ways if, for example, name gets assigned a different offset by changing the order of the data members.

Passing a structure to a variable-argument function will place the entire structure on the stack, which is probably not what the function expects, and will usually lead to undefined behavior.

Common causes include passing a structure when you meant to pass the structure's address or one of the structure's members, using the wrong variable, using a variable with the wrong type, or calling the wrong function.

Look at the function's definition to see why it expects a variable number of arguments, and then look at the variable's declaration to see what its type is. Variable-argument functions should usually only be passed scalar types (built-in types, and pointers).

Avoid returning deallocated memory from a function

Because the resource (for example, memory) is not going to exist after the function returns, making a pointer to that resource available from outside the function is usually an error - and dereferencing that pointer will result in a random value.

Example 1:

char *bar(void) { char foo[100]; ... return foo; }

The array foo is allocated on the stack and will be deallocated just as the function returns, causing a pointer to deallocated memory to be returned.

Example 2:

void bar(void) { static int *foo; ... free ( foo ); }

This will leave foo pointing to deallocated memory, which may be a problem if foo is accessed later. It is generally recommended to write the code in this manner:

void bar(void) { static int *foo; ... free ( foo ); foo = NULL; }

This way, testing (foo == NULL) can be used to determine whether foo contains a valid pointer or not.

In Example 1, the deallocated memory was exposed via return. In Example 2, it was via a static variable. In general, this complaint is issued if a resource is allocated and then deallocated in a function - and a pointer to the resource is made accessible after a return from the function. There are 6 ways of exposing deallocated memory, all illustrated in this example:

extern char *Extern; static char *Static_File; static char **Heap; char *foo(char **Parameter, int a) { static char *Static_Local; char Local[10]; Heap = (char **) malloc(sizeof(char *)); switch(a) { case 1: *Heap = Local; break; case 2: *Parameter = Local; break; case 3: Extern = Local; break; case 4: Static_Local = Local; break; case 5: Static_File = Local; break; case 6: return Local; } return 0; }

Common causes include forgetting to allocate the memory in a more permanent fashion (on the heap, for example), using the wrong address, or using a pointer instead of the dereferenced type.

Examine where the external address of the temporary memory is being assigned, and decide if it should remain a pointer or not. If so, you must change the memory that is pointed to so that it will remain after the function returns. If not, then you should change the type of the external assignment so that it is no longer a pointer to temporary memory.

The user may not consider all 6 ways equally serious.

printf argument type is passed to the wrong kind of pattern

Example:

int xyz; ... printf("beginning of pattern %s rest of pattern", xyz);

will generate the complaint

printf argument 'xyz' of type 'int' is being passed to the underlined beginning of pattern %s rest of pattern ^^

Passing the wrong type to a format string will result in the type being converted incorrectly, which will result in garbage values in the final string.

Common causes include missing a format specification or an argument, which leaves the arguments mismatched with the specifications - or passing the wrong variable or value.

Look at the format pattern and ensure that the parameter's type matches what the format pattern expects.

In general, the following rules are assumed

In addition, anything matches void*.

The function printf is not the only one whose format will be checked. C/C++ Data Flow Analysis will check any function's arguments for incorrect format, provided that function is given a format attribute.

Do not call printf with missing arguments

Example:

printf("beginning of pattern %s rest of pattern");

will generate the complaint

No printf argument to match the underlined beginning of pattern %s rest of pattern ^^

When there are not enough arguments to match all of the format specifications, the function will begin reading uninitialized stack values to fill their places. This will result in random values in the final string.

Common causes include forgetting to pass enough parameters to the format pattern - or forgetting to remove the specification from the format pattern, if it is no longer needed.

Ensure that the number of format specifications is the same as the number of parameters being passed to it.

Do not call printf with unused arguments

Example:

printf("pat", xyz);

will generate the complaint

Argument 'xyz' not used in the printf pattern pat

When there are too many parameters in a format pattern, the format pattern will simply ignore the rest. This will not result in unpredictable behavior, however, it will result in values not being added to the final string.

The problem is much more serious if the pattern uses the notation %5$d (instead of %d). This can result in an unused argument in the middle, which can result in unpredictable behavior.

Common causes include forgetting to add the specification to the pattern, or passing too many parameters to the function.

Ensure that the number of format specifications is the same as the number of parameters being passed to it.

Avoid invalid printf conversion specifications

Example:

long int xyz; ... printf("beginning of pattern %l rest of pattern", xyz);

will generate the complaint

The underlined is an invalid printf conversion specification beginning of pattern %l rest of pattern ^^

In the example, the underlined string should probably be %ld.

The format pattern contains a conversion specification that is unknown. This will usually cause the rest of the arguments to be off by one, since the parameter for this specification will not be used here.

Common causes include using the wrong conversion specifier (writing %l when you meant %ld for example), or writing a single percent sign to represent a literal percent sign in the output (which must be written as %% in the format pattern).

Look at the format specification and decide what you meant for it to be. Either change it so that it is a valid conversion specification or remove it from the format string. Also ensure that the arguments are still correct, given the new format string.

scanf argument must to be a pointer

Example:

int xyz; scanf("%d", xyz);

should probably be

int xyz; scanf("%d", &xyz);

Since scanf dereferences its arguments as pointers, they must all be pointers. Passing anything else will usually lead to unpredictable behavior since scanf will treat the value as an address and attempt to write to that memory.

Common causes include forgetting to take the address of a variable instead of using the variable itself, passing the wrong variable, or passing a variable of the wrong type.

Ensure that all of scanf's arguments are pointers to valid memory.

The function scanf is not the only one whose arguments will be checked. C/C++ Data Flow Analysis will check any function's arguments, provided that function is given a format attribute with kind=scanf.

Function lacks a return statement with a value

The named function was declared to return a value. However, there is a path through the function along which no value is returned. This can happen, for example, if control flow reaches the end of the function without encountering a return statement. For example,

int foo() { if (a != 0) return 10; --a; }

<-- complaint

In this situation, the compiler will arrange for a return statement under the covers but it will not return any value. Hence, a random value is returned to the caller.

This rule is also issued for explicit return statements that do not return anything, as in this example:

int foo() { if (a != 0) return; <-- complaint --a; return 10; }

This error is also often found in functions that have no declared return type. The compiler then assumes an implicit return type of int and not void.

Look at the path through the function and decide where a return statement should go, and what should be returned.

Do not use statements that always fail

This is a statement (typically an assignment) that is normally not meant to fail, yet it always results in runtime error.

Example:

#define M(x) x == 0 ? 10 : x == 1 ? 20 : (abort(), 0) y = M(2);

Causes of this failure include the functions abort, fail, and several project-specific functions.

Look at the statement and decide why it will always fail. For some reason, all possible paths lead to failure, which is rarely desired. Figure out which path(s) should not have led to failure, and make the appropriate change.

Always return a value in functions with non-void return types

This complaint is issued for explicitly-written return statements that do not return a value, although the containing function has a non-void return type.

Example:

int foo() { ... return; }

The return statement specifies no return value, therefore the function will return a random value.

This error happens often in legacy C/C++ code where functions that do not return a value were declared without a return type, such as:

foo() { ... return; }

In this case, the compiler will give the function an implicit int return type and not, as might be expected, a void type. Declaring functions this manner is a deprecated practice and, in C++, it is an error. Code that uses this old-style declaration should be changed to use void instead.

Argument of function should be a pointer

Example:

struct s {int a; int b:} S; ... free(S);

The function free() expects a pointer. The above example would be a syntax error if the compiler had a the standard declaration of free() as in stdlib.h. However, in the absence of a declaration for free(), this is not a syntax error. The compiler will assume that the function free takes struct s as argument.

A common cause of this error is typing free(*p) instead of free(p).

Never divide by 0

Division by 0 typically results in an execution exception. If C/C++ Data Flow Analysis finds the possibility of this error, this result will be issued and an execution path that would lead to the error will be provided.

Avoid resource and memory leaks - and always close files

This complaint is followed by a path through the program. Along this path, a resource is obtained (for example, memory is allocated or a file is opened) and becomes inaccessible before it can be released (for example, memory is freed or the file is closed). It becomes inaccessible because variables containing a pointer to the resource receive new values or become deallocated themselves.

As an example, consider a memory leak. Losing all pointers to allocated memory means that the memory can never be freed. This memory leak can cause the application to run out of memory if it happens repeatedly. If the memory was an object, its destructor will never be called, which can also lead to incorrect or missing behavior.

Common causes include forgetting to deallocate the memory before assigning over the pointer, forgetting to save the pointer somewhere, or allocating memory when you meant to simply use a pointer to memory that was already allocated somewhere else.

Look at the memory allocation, and watch the pointer. Before it is assigned over, it should be either deallocated, or stored somewhere. Resource leaks in the main() function are not really a problem because resources will be freed by the operation system upon return from main() anyway.

Avoid uninitialized members

This rule should be enabled if you intend constructors to initialize all class members. The complaint will be issued if there is a path through a constructor along which a member is not initialized.

The same kind of conservatism applies here as for uninitialized variables, in general. In particular, this complaint will be suppressed if the constructor calls a method whose side-effects C/C++ Data Flow Analysis does not understand, because that method might perform all initialization.

Always allocate and deallocate memory consistently

Example 1:

a = new int; free(a);

For this code, C/C++ Data Flow Analysis would generate the following complaint

-- ERROR26(memory allocation source) /*wrong deallocator*/ >>>ERROR26_f_a19412af0a39ba "p.C", line 11: 'a' is not 'from malloc' ONE POSSIBLE PATH LEADING TO THE ERROR: "p.C", line 10: 'operator new' generates a value that is 'from new' "p.C", line 11: 'free' requires its argument to be 'from malloc'

When memory is allocated via one mechanism, it needs to be freed via the matching mechanism. This is because each mechanism (for example, malloc/free or new/delete) allocates and deallocates memory in a certain way, which might be drastically different from the others. Using the wrong one can lead to anything from a memory leak to incorrect behavior or corrupt memory.

C/C++ Data Flow Analysis knows about this rule of pairing allocators and deallocators because they have been assigned certain values of the property memory allocation source. Properties can be assigned to functions using attributes.

Besides allocators and deallocators, other properties can be assigned. For example, you can have a sortedness property - a function sort might declare its output to have the value sorted, and a function binary_search might require its input to have the value sorted. Another example would be a security property - the function fgets might declare its output to have the value not_trusted, and a function system might require its input to have the value trusted.

This rule is parametrized by the name of the property violated, provided the property does have a name. This allows you to control the properties that you want to.

Always terminate function arguments with NULL where required

Some functions that take a variable number of arguments rely on a terminating NULL. In this case, a terminating NULL was not found.

Ensure that you meant to call this function that expects NULL as its last argument, and then terminate the argument list with NULL.

Do not access printf arguments both randomly and sequentially

Example:

printf("print arg2 by random access: %2$d, sequential access: %d", abc, uvw, xyz);

will generate the complaint

It is not allowed to access printf arguments both randomly and sequentially print arg2 by random access: %2$d, sequential access: %d ^^ ^^

If a format pattern uses the random method of accessing arguments, then it must use that method throughout, or unpredictable results may occur.

The complaint prints the format pattern and underlines two argument accesses, one random and one sequential, indicating the inconsistency.

Always handle exceptions properly

An exception may be thrown, either explicitly via throw(), or implicitly by calling a function which throws exceptions. This thrown exception is not caught inside the current function, and the current function is not permitted to throw it because it is not declared in the function's exception specification list.

Example:

void foo() throw(int) { ... bar(); // declared to throw double ... }

When a function declares an empty exception specification, it may not throw any exceptions. When a function does not declare an exception specification at all, it may throw any exception.

Example:

void foo() throw(); // guaranteed not to throw an exception void bar(); // may throw any exception

However, it makes the most sense to assume that, if you are calling a function with no exception specification, it will not throw anything. It also makes sense to assume that the function you are in may throw anything, if it does not list an exception specification.

Do not rethrow an exception without an exception handler

It is valid to call throw() with no throw expression while within an exception handler. This will re-throw the exception that is currently being handled by the exception handler. However, calling throw() outside of an exception handler will immediately call the special function terminate(), which will terminate the application.

Always handle exceptions that may be thrown as a result of variable initialization/deconstruction during exception handling

When an exception is thrown, events such as initializing the exception object that was thrown, unwinding the stack to the correct exception handler, and initializing the exception handler's parameter can occur.

If a second exception is raised during these tasks, before the exception handler for the original exception is entered, then the special function terminate() is called, which will terminate the application.

It is recommended that you not use classes as exception objects if their copy constructors may throw an exception, because the copy constructors may be invoked to initialize the thrown object or the parameter to the exception handler.

You should also avoid declaring an object on the stack if its destructor may throw an exception, and if it is along a path that will be unwound when a later exception is thrown.

Here are examples of both:

class A { public: A(); A(const A &) throw(int); ~A() throw(double); }; void foo(A obj) { ... throw(obj); // copy constructor may throw 'int' during initialization ... } void bar() { A obj; // destructor may throw 'double' during stack unwind ... throw(...); }

Avoid shifting outside of the allowed range

Example:

int i = x; i <<= 32;

The C and C++ standards define the result of a shift operation only for positive shift amounts that are smaller than the width of the left operand. For all other shift amounts, the behavior is undefined, which means that the result is unpredictable and different compilers may produce different results. For instance, the value of i in above example can be anything (assuming that sizeof(int) <= 32). Certain versions of xlc will produce a result of 0, whereas certain versions of GCC will leave the value of i unchanged.

If you are relying on one of these behaviors then you need to ensure it by other means, perhaps a special shift function or macro.

Avoid assertions that always fail

Example 1:

if (x < 0) x = -x; assert(x > 0);

The above assertion will fail for x == 0.

It is recommended that this rule not be enabled because there are inadequate assert facilities in most languages. Consider the following.

Example 2:

file = fopen("xyz", "r"); assert(file);

Suppose that your coding guidelines require that nobody rely on fopen() succeeding, and you expressed it using the force_test function attribute. That attribute tells C/C++ Data Flow Analysis that fopen() can return NULL. As a result, C/C++ Data Flow Analysis will issue Avoid assertions that always fail because the above assertion could fail.

The issue is that the above assert will fail if a file is missing, which is not an error in your code. You would like C/C++ Data Flow Analysis to issue complaints regarding asserts about your code only, but there is no facility to distinguish them. More generally, an asserts claims that an expression will be true for all values of inputs, without specifying which inputs. Ideally we would like to distinguish among asserts on inputs into a function, inputs into a larger module, inputs into the whole program, etc. Until there is such a distinction, it is not useful to check assertions statically.

Avoid expressions that modify a location that may be assigned or used by another expression

Example 1:

i = 5; a[i++] = i;

The C and C++ language standard gives compilers considerable leeway in reordering expressions for evaluation. In the above example a compile is free to first increment i in the array index and only then fetch the contents of i on the right hand side of the assignment. Or, the compiler is free to delay storing the new value of i until the end of the evaluation of the entire expression. Because of this, the effect of the example above is undefined, and no logical result can be assumed.

For this reason the language standard disallows certain expressions such as the array assignment in Example 1. The rules disallow even certain expressions where the final outcome would happen to be the same no matter what the order of evaluation, as in

Example 2:

i++ + i++; i = i = 5;

Since the standard declares that the above expressions are undefined, a compiler is free to, for example, set i to 42, or do nothing with it. Therefore C/C++ Data Flow Analysis flags expressions like Example 2 as well.

The exact rules of undefined expressions are rather involved and are not reproduced here. But you are encouraged to look them up in your language document, usually under sequence points. If you find the rules too complicated, a simple and safe rule of thumb is to avoid sub-expressions with side-effects.

C/C++ Data Flow Analysis does not guarantee to find all violations, as illustrated in the following example.

Example 3:

a[i] = a[j]++;

Example 3 is illegal if i=j, but C/C++ Data Flow Analysis will flag it only if it can determine that i=j from local context.

Avoid NULL function pointers

This complaint is followed by a path through the program. This path ends in function call, for example foo(123), where foo is a function pointer whose value is NULL. This call would result in unpredictable behavior due to invalid instruction.

This usually happens inside a function bar whose parameter is a function pointer foo. If the caller of bar could pass NULL as value of foo, then foo needs to be checked for NULL before used in a call.

Avoid right shifting negative numbers

The C and C++ standards declare as undefined the result of a right shift applied to a negative number. That means that the outcome is unpredictable and different compilers may produce different results. In addition, the standard allows a compiler to, for example, generate the number 42 for any right shift of a negative number. In this undefined case, C/C++ Data Flow Analysis follows the behavior of the host machine that it is running on.

Mistakes:

Avoid statements. expressions and dereferences that have no effect

Example 1: (Probably missing '=')

x + 2; /* Statement has no effect */

Example 2: (Probably missing '=')

x << 1; /* Statement has no effect */

Example 3: (The address of a variable is never 0)

char x[100]; assert(x); /* Statement has no effect */

Example 4: (Using comma instead of '=')

x, y = 0; /* Expression 'x' has no effect */

Example 5: (Using comma instead of '&&')

for (I = 0, J = 0; I < N, J < M; I++, J++) /* Expression 'I < N' has no effect */

Example 6: (Precedence error: *a++ instead of *(a++))

*a++; /* The value computed by '*(a++)' is not used */

Example 7: (Whole statement has an effect, but last operation '+' does not)

x + foo(); /* The value computed by 'x + foo()' is not used */

In all of these cases, valid code results in something being lost. Either the only effect of the expression is being ignored, or there are effects of the expression that are being discarded unintentionally, or the expression is unnecessary all together.

Look at the expression in question and see if it comes close to any of these examples. Also look to see if the expression is really doing what it was meant to do, and that the result of the expression is really being used.

Avoid unreachable statements

This complaint is issued because programmers do not typically intend to write unreachable statements. Unreachable statements are the result of a label that is:

not the target of any goto, or
testing an expression that is always true or always false, or
an impossible case in a switch statement, or
preceded by (or following) an unconditional branch

Example: (The label L is unreachable)

if (0 == 1) goto L; return 0; L: return 1;

Another example of an unreachable statement is here

typedef enum {red, green} color; int f(color c) { switch(c) { case red: return 0; case green: return 1; default: return 2; } }

The default case will be flagged as unreachable, provided the type color is "clean". That means the variable c cannot have a value other than one listed by the enumerators of the type color.

Look at the unreachable statements, and decide under which circumstances they really should be reachable. If they should never be reachable, think about removing them all together.

Note, that this complaint is not issued for statements that have no effect. The rationale is that such statements can be removed anyway without changing the program behavior and therefore that fact that they cannot be reached is not interesting.

A handler of exceptions may appear unreachable just because functions are not required to specify all the exceptions they can throw. Consider this example.

try { foo1(); foo2(); cleanup(); } catch (...) { return; }

Avoid unreachable statements is usually accompanied by Avoid statements that always evaluate to the same value (an impossible test branch) or Avoid labels that are not referred to by any goto statement. If you decide to turn on Avoid unreachable statements, it is a good idea to also turn on the other two as they may provide an explanation of the cause.

Relation == binds more tightly than & and should be avoided

Operator precedence defined by the programming language is not always expected by programmers. In this case, what was written is probably not what was meant, and should be inspected for correctness.

Common causes include assuming the wrong precedence, using the wrong operator, or forgetting parenthesis around part of the expression.

Example:

if (A&B == 0)

should probably be

if ((A&B) == 0)

In the above example the programmer probably expected '&' to bind more tightly than '==', which is the opposite of the C language definition.

Examine the expression and decide which precedence was intended. When in doubt as to what the language defines the precedence to be, use extra parenthesis.

Always consider adding break statements to cases

When executing a case statement, the execution path will continue through all case statements until a break is encountered. This means that more than one case statement might be run accidentally.

Example:

switch (A) { case 0: B = 0; case 1: B = 1; }

Without a break statement, after assigning 0 to B, 1 will be assigned to B.

Look at your case statements. If you intend to fall through from one statement to the next, consider marking this error innocent. Otherwise, insert the appropriate break statements so that the execution path is correct.

Sometimes it is intended to fall from one case to the next. The most common example is

switch (A) { case 0: case 1: B = 1; }

Avoid statements that always evaluate to the same value

Normally people do not mean to test conditions that are always true or always false.

Example 1:

if (strlen(S) >= 0) /* Condition is always true */

Example 2:

typedef enum {A, B} AB; void foo(AB ab) { switch(ab) { case 0: break; /* Case is possible */ case 1: break; /* Case is possible */ case 2: break; /* Case is not possible */ default: break; /* Default is not possible */ }

Common causes include testing the wrong conditional, using the wrong operator (= instead of ==), using operators that evaluate in an unexpected order, or testing a condition that has already been tested for.

condition_implied_from_local
- Example:
  - if (x >= MIN || x <= MAX) ...
  - This is an example of a common mistake when testing whether x is in the range between MIN and MAX. (The mistake is in writing || instead of &&). The symptom of the mistake is that the expression x <= MAX is always true whenever evaluated; and it is evaluated only if x >= MIN is false.
  - Another example:
  - if (x > A || x <= A && y > B) ...
  - This is a common coding practice where the unnecessary condition x <= A is added for clarity.
condition_implied_from_non_local
- Example:
  - if (x > 5) { if (x >= 5) ... }
  - The condition x >= 5 is always true because it is executed only if x > 5, and C/C++ Data Flow Analysis will identify that reason. A common cause is testing the wrong variable in one of the two conditions. Sometimes unnecessary condition are tested just to make sure, and if this is your coding style you need to disable this category.
condition_implied_from_declaration
- Example:
  - char c; for (c = 0; c < 256; c++)
  - This is a common mistake resulting in an infinite loop, whose symptom is that the condition c < 256 is always true. It is always true independently of any other conditions, and the reasons can be ascertained by looking at the variables involved and their declarations.
condition_implied_from_sizeof
- Example:
  - if (sizeof(int) == sizeof(long))
  - On any particular machine this condition will be always true or always false.
condition_implied_from_operation_on_literals
- Example:
  - if (1 + 1 == 2)
  - This type of condition usually occurs as a result of macro expansion.
condition_implied_from_literal
- Example:
  - if (1) ... if (x == 0 || 1) ...
  - The condition is an explicit literal constant. The first example is common and usually intended. The second example is also common and rarely intended.
  - You probably want to enable checking for the second one and disable the first one. For that reason we distinguish between whole_condition_implied_from_literal and sub_condition_implied_from_literal as explained below.
condition_implied_from_right_of_assignment
- Example:
  - if (x = 1) ...
  - Writing = instead of == often results in a condition that is always true or always false. The reason is that the value of the condition is the value of the right hand side of the assignment. This category applies to all assignments independently of the causes why the the right hand size is always true or always false.
condition_implied_from_call
- Example:
  - if (foo()) ...
  - If foo() can never return 0 then the condition is always true. C/C++ Data Flow Analysis knows about the possible return values of foo() either from a user declared attribute or from examining the body of foo().
condition_inexplicable
- None of the above categories applies.
  - Example:
  - if ((x || y) && (!x || y) && (x || !y) && (!x || !y))
  - In every particular case you might see the reason and wonder why C/C++ Data Flow Analysis could not print the reason. We are trying to improve our diagnostic abilities, but we will never be able to do a good job in all cases. You may find this example particularly surprising:
  - x = 0; if (x == 0) ..
  - For reasons too difficult to explain C/C++ Data Flow Analysis is unable to pinpoint the assignment as the cause why x == 0 is always true.

For example,

if (1) ... if (x == 0 || 1) ... case 1:

The condition 1 appears as whole_condition in the first if statement, as sub_condition in the second, and as case_condition in the third.

Ensure that the third argument of strncmp is the length of the smallest string

When using strncmp, the length argument is usually meant to be the length of the smaller of the two strings.

Example:

if (!strncmp(S, "abcd", 3))

Here the third argument should probably be 4. Otherwise S is being compared against "abc" only.

Thus, by default the complaint will be suppressed for

if (!strncmp(S, "abcd", 5))

which is equivalent to

if (!strcmp(S, "abcd"))

The reason for this default setting is that there is a common belief that the function strcmp is not secure and strncmp should be used instead. That is not true -- strcmp is just as secure as strncmp; this belief probably originated from the fact that strcpy is not as secure as strncpy. Nevertheless, to avoid complaining about the misguided use of strncmp we suppress that special case.

Avoid reusing loop index variables within nested loops

Using the same variable to control nested loops can cause unwanted side effects on loop conditional tests.

Example 1:

for (I = 0; I < N; I++) for (I = 0; I < M; I++)

will cause the complaint

"f.C", line 123: The variable 'I' already controls the loop on line 122

Example 2:

while (I < N) for (I = 0; I < M; I++)

Again the inner loop has a side-effect on the test of the outer while loop. However, this situation is more often intended than the situation in Example 1.

No complaint will be issued if the inner for-loop contains no initializer, as in this example.

while (I < N) for (; I < M; I++)

The lack of initializer in a for-loop is an indication that the programmer does intend the side-effect.

Check the variables that are controlling the nested loops, and be sure that they should be the same variable. If not, change them so that the loops have separate controlling variables.

Consider using sizeof(type) when calling malloc

This check complains if the size of allocated memory looks somehow suspicious. For instance:

long *Array = malloc(11 * sizeof(long *));

should probably be

long *Array = malloc(11 * sizeof(long));

Likewise:

long *Array = realloc(Array, 64);

should probably be

long *Array = realloc(Array, 64 * sizeof(long));

The multiple_or_sizeof policy is useful when you allocate structures with an extensible array at the end. Like so:

struct foo { int n; int array[1]; } *var; var = malloc(sizeof(struct foo) + 7 * sizeof(int));

Assuming 4-byte integers the allocated size is 36 bytes and therefore not a multiple of struct foo which consumes 8 bytes. With default settings C/C++ Data Flow Analysis will produce a complaint which is not what is desired here.

Avoid trying to allocate invalid amounts of memory

Passing an invalid value to an allocator can result in anything from failing to get the intended memory to crashing the application. Some functions allocating memory are unable to satisfy a request for 0 bytes or a negative amount of bytes. If this happens, this rule is issued.

Common causes include not checking the size being passed to an allocator, or allocating a negative amount of memory by mistake.

Things to look for include where the size being allocated is calculated, and why it is invalid. Ensure that the request contains a valid size, so that memory can be returned successfully.

C/C++ Data Flow Analysis recognizes a function as allocating memory when it is given such an attribute. Part of the attribute specifies what the function will do what asked for 0 bytes or for a negative number of bytes.

Avoid invoking a function more than once with identical arguments

This is usually a symptom of unintentionally invoking a function more than once with the same arguments. Invoking a function in such a manner (if it was unintended) is inefficient, and may be incorrect if the function has side-effects. C/C++ Data Flow Analysis will issue the complaint iff the function invocations stem from a macro expansion.

Example:

#define MIN(a,b) a < b ? a : b ... MIN(foo(), 5)

The macro MIN may invoke the function foo twice. It is better to code it like this:

t = foo(); MIN(t, 5)

Consider using fabs instead of abs to avoid precision loss

The function abs converts its argument to an integer before computing the absolute value. In the cases where the argument is a float or double (or long), precision will be lost.

Example:

double d = 1.5; double a = abs(d);

This was probably meant to be fabs, which handles floating point numbers.

In general, ensure that the type of the variable you are passing to certain functions is compatible with the type of the argument that the function expects.

The function abs is only an example of functions that could cause unexpected loss of precision.

Avoid assigning the same value to enumerators within an enum

Most of the time, enumerators are meant to contain distinct values. This complaint is issued when an enumeration maps more than one enumerator to the same value.

Example:

enum { A, B, C, D = 0x01, E };

C/C++ Data Flow Analysis will issue the following two complaints

The enumerator D has the same value 1 as B The enumerator E has the same value 2 as C

Common causes include assigning the wrong value to one of the enumerators - or assigning any values at all. If the repetition was intended, it should be marked innocent, or (if it happens commonly throughout this code), this complaint should be disabled.

Avoid implicit function declarations

The C language does not require that functions be declared before use, but it is generally recommended. The C compiler will issue a warning for undeclared functions and the linker will actually fail if an implementation of the function is not provided.

Therefore this complaint is not intended to catch programming errors, but rather failures to provide some include files. It is recommended that this complaint be turned on because otherwise C/C++ Data Flow Analysis may issue other complaints, which the user might find hard to understand. The real cause might be a missing .include file or incorrect definitions given to C/C++ Data Flow Analysis.

Common causes include misspelled function or type names, or missing include files. Determine if the correct function or type is being used, and if so, include the appropriate header files that define that function or type.

If many functions are intended to be undeclared, then this complaint could result in too much noise.

In any case this complaint is issued only once per source file and undeclared function.

Avoid empty if-then-else statements

Example 1:

while (a > b); foo();

Example 2:

for (p = str; *p != '?'; p++); foo();

Example 3:

switch (x); { case 0: ... }

Example 4:

{ if (a > b); }

Example 5:

{ if (a > b); else x++; }

There are several language constructs such as while, for, switch, and if, that apply to the next single statement or block of code. A semicolon denotes the end of a single statement, and therefore a spurious semicolon after one of these constructs will give those constructs an empty body.

A human eye is bad at noticing spurious semi-colons and sees the next non-empty statement as the body of the construct. At the same time it is possible that the semi-colon is actually an intentional expression of an empty body. In Example 1 the call to foo() is probably meant to be executed on every iteration of the while-loop, but in Example 2 the same call is probably meant to be executed only after the for-loop. That intention is conveyed by indentation. Similarly the semi-colon in Example 3 is almost surely unintended, judging from the following block of code. (Please note that Example 3 is syntactically correct in C/C++, provided it is embedded in another switch statement.)

An if-statement is somewhat different in that it applies to two statements -- then and else clauses. While Example 4 shows an if-statement with no effect, the semi-colon in Example 5 is quite surely intended because it would not be syntactically correct otherwise.

Avoid empty if-then-else statements reports spurious semi-colons.

Loop will be executed at most once

If a loop is encountered that will never execute more than once, then there is an error in the logic that decides when to exit the loop or the loop construct was not needed.

Example:

for (i = 0; i < n; i++) { ... some statements not containing "continue" ... break; }

This loop will break on the very first iteration, therefore the loop construct is not needed. This is sometimes an indication of an error in the conditions for terminating the loop.

The exact cause for this error is tough to track down sometimes. Some things to look for include any break or continue statements in the loop, and under what conditions they are encountered. Also look at the loop's conditional statement, and ensure that it is correct and executes the loop as many times as you expect it to. When all else fails, use a debugging technique to watch the variables and the loop iterations to better understand what is happening.

Avoid comparing pointers rather than the objects themselves

Example 1:

void foo(char *string1, char *string2) { if (string1 == string2) ... }

This will compare whether string1 and string2 are identical pointers and will evaluate to false whenever the pointers are different, even if their two character strings are the same. If you meant to check whether the strings were the same, you want to write it as

void foo(char *string1, char *string2) { if (strcmp(string1, string2) == 0) ... }

Example 2:

void foo(char *string1) { char *string2 = strstr(string1, "xyz"); if (string2 == NULL) ... if (string1 == string2) ... }

In Example 2, Avoid comparing pointers rather than the objects themselves will not be issued regarding the comparison will NULL. The complaint will be issued regarding the comparison of string1 and string2.

Example 4: Suppose

typedef String T;

Then the complaint will be issued when comparing expressions of type T in addition to expressions of type String.

Avoid failing to copy the terminating '\0'

You are taking advantage of a feature of the C language that is disallowed in C++ because it can have unpredictable results.

Example:

char s[2] = "ab";

In C this initializes the array 's' to the string "ab", but there is no room for the terminating 0 character. Therefore the terminating 0 character will not be copied into 's' and 's' will not be a legal C-string. This is rarely intended, instead the following is intended

char s[3] = "ab";

Exception handler for type is unreachable

The exception handler is unreachable because a previous exception handler is masking it.

Example:

try { throw(...); } catch(const char *) { ... } catch(char *) { // unreachable handler ... }

Because the handler for const char * will also catch char *, then the handler for char * is unreachable.

This complaint exists because there are special rules for catching exceptions that allow unexpected types to be caught at unexpected times.

Avoid functions that have a category advisory

The function being called has an advisory associated with it (see the advisory attribute). The advisory text should explain why the advisory was issued and what steps should be taken to avoid the advisory in the future. In most cases, the advisory will suggest calling a different function.

The category of the advisory is a free-form string which should describe the severity of the advisory. The category is also used to filter out unwanted advisory messages (see below).

Example:

char buff[64]; gets(buff);

This will print an advisory for the call to gets because calling gets is unsafe, and fgets should be used instead.

switch does not consider all cases

Example 1:

typedef enum {A, B, C, D} E; int foo(E e) { switch(e) { case A: return 1; case B: return 2; } return 0; }

C/C++ Data Flow Analysis will issue the following complaint

"foo.c", line 20: switch does not consider all cases Missing C, D

This rule is meant to support the following policy toward default clauses in switch statements.

(I) If you intentionally did not list all possibilities and you do want the switch statement to have no effect for the omitted cases, then clarify your intention by

default: break;

(II) If you intentionally did not list all possibilities because and you are relying on callers to your function foo(E e) to pass only listed cases, then clarify your intention by

default: abort(); /* or something like it */

(III) If you intended to list all possibilities then omit any default clause and let C/C++ Data Flow Analysis tell you if you missed something.

Example 2:

typedef enum {A, B, C, D} E; int foo(E e) { if (e == C || e == D) return 0; switch(e) { case A: return 1; case B: return 2; } return 0; }

In Example 2 all possible values of e are meant to be considered inside the function foo(E e) and therefore it is best to omit the default clause. The advantage of omitting the default clause will be realized if somebody adds another value to the enum type E, and C/C++ Data Flow Analysis will be able to report that the new value is not being handled.

Avoid using boolean expressions in non-boolean contexts

This mistake is intended to catch unintentional uses of boolean values. In some cases, the use may be unintentional because of a simple typo:

if (!x & y)

In other cases, the use may be unintentional because the wrong operator was used:

if (~(x && y))

Examine the boolean expression and the context in which it is used, and decide if you really meant to use it in a non-boolean context.

Mistakes:

Avoid unused static variables with file scopes

The value of the static variable xyz is never used, which is sometimes an indication of forgotten functionality.

This complaint is issued very conservatively; it will not be issued if the address of the variable is ever taken (i.e. &xyz), as that might lead to using its value.

You should check any code that is supposed to be using this variable, and ensure that it is indeed being used. Make sure there are no local variables with the same name that could be "shadowing" this global, and make sure that the paths of code that use this variable are reachable.

Avoid unused static variables with file scopes is analogous to Avoid unused local scope variables; the difference being that Avoid unused static variables with file scopes is for static variables outside of any function and Avoid unused local scope variables is for variables declared inside a function.

Avoid unused local scope variables

The value of the variable xyz is never used, which is sometimes an indication of forgotten functionality.

This complaint is issued very conservatively; it will not be issued if the address of the variable is ever taken (i.e. &xyz), as that might lead to using its value.

You should check any code that is supposed to be using this variable, and ensure that it is indeed being used. Make sure there are no local variables with the same name that could be "shadowing" this variable, and make sure that the paths of code that use this variable are reachable.

Avoid unused local scope variables is analogous to Avoid unused static variables with file scopes; the difference being that Avoid unused static variables with file scopes is for static variables outside of any function and Avoid unused local scope variables is for variables declared inside a function.

Avoid unused parameters

The value of the parameter xyz is never used, which is sometimes an indication of forgotten functionality.

This check is issued very conservatively; it will not be issued if the address of the parameter is ever taken (i.e. &xyz), as that might lead to using its value.

You should check any code that is supposed to be using this variable, and ensure that it is indeed being used. Make sure that the paths of code that use this variable are reachable.

This complaint may be issued under some unexpected circumstances, for example

int foo(char c) { return c == EOF; }

Avoid inefficiencies when passing or returning variables

Declaring a function that takes in a structure as a parameter or that returns a structure by value is inefficient because every invocation will copy all of the structure's fields.

Example:

struct CC { ... }; void f(struct CC xyz); struct CC g(void);

In this example, every call to f() and g() involves copying all of the fields of struct CC. It is usually much more efficient to declare the function to take in or return an address or a reference to the structure instead.

In C++, passing objects in this fashion may involve calls to the copy constructor of the class. This can also lead to inefficiencies.

Avoid = operators in boolean expressions when == is intended

This potential typo can result in drastically different results, and can be difficult to track down. Using = when == was meant will lead to the boolean result being the value of the right hand side, instead of a test for equality.

This complaint would be issued in the following example

if (foo = bar)

The if statement performs an assignment to foo, which should possibly be a mere comparison. On the other hand, it might have been really intended as an assignment followed by testing foo != 0.

C/C++ Data Flow Analysis allows differentiation among three possibilities:

Right hand side is constant; here are three examples
- if (a = 5) if (a = SOME_DEFINED_CONSTANT) if (a = &b)
  - This possibility is very unlikely intended, because the if condition would be always true (or always false if the right hand side were the constant 0).
Right hand side is memory access; here are two examples
- if (a = b) if (a = b->x.y)
  - This possibility is more likely intended.
Right hand side is some computation, which covers all the remaining situations; here are two examples
- if (a = b+c) if (a = f(b))
  - This possibility is most likely intended.

Decide if you meant for this to be an assignment (with the boolean result being the value of the right hand side), or a real test for equality. If this should remain an assignment, consider changing something like:

if(a = b)

to:

if((a = b) != 0)

The intentions are made clear, and beam will not warn about this.

The user can control which of these possibilities he wants to have reported:

Avoid using values that are not allowed in expressions

An integer value is assigned to an object, but the value is outside of the domain of values for the object's type.

Examples:

unsigned char uc; signed char sc; ...... sc = uc + 100000;

Independent of what uc's value might be, the result of the addition is too large for a signed character object.

Other examples include mixing integers and enums. Casting a number to an enumeration can have unpredictable results when the number doesn't map to any enumeration value.

Example:

typedef enum { red, blue, green } color; color xyz = (color) 5;

This could be caused by casting to the wrong type, or casting the wrong value. Or, maybe the cast is valid, and the variable should have a value not covered by the enumeration.

By default C/C++ Data Flow Analysis assumes that all variables declared to be of enumerated type can only contain values derived from the enumerators. If the type color does not have this property you can declare it as such using the global parameter beam::dirty_enum or beam::unused_enum_value.

Avoid casts that truncate bits

Example:

char c; int i; c = i;

This will cause the following complaint:

"d.c", line 7: The cast '(char)i' truncates 32 bits into 8

This complaint is usually useful only when loss of information is unexpected - namely when the cast is implied by the compiler and not written explicitly by the programmer.

Turning on Avoid casts that truncate bits will result in more complaints than the similar Avoid using values that are not allowed in expressions. While Avoid using values that are not allowed in expressions is issued only when C/C++ Data Flow Analysis is sure that the value being cast does not fit into the target type, Avoid casts that truncate bits is issued even if the value being cast is unknown. (Of course, Avoid casts that truncate bits will not be issued if it is clear that the value cast does fit into the target type.)

Avoid labels that are not referred to by any goto statement

An unused label was found. This is either a label that can be deleted or a symptom of a missing goto that should be using this label.

Check for any spots where someone should be using goto with this label. If there are no such spots, this label could be deleted. Otherwise, something may be missing somewhere else.

Avoid assert statements that might have side-effects

Invoking functions or using expressions that have side-effects from macros that could be compiled out (like assert) is usually incorrect because the side-effect is lost in some cases, such as when compiling optimized code.

Example:

assert(++a > 0);

The assertion might be disabled when compiling an optimized executable, and the incrementation of a would be lost. It is a better practice to take care of expressions and function calls outside of the macro, save the result in a temporary variable, and then assert a simple equality or inequality.

This warning is issued for any function invoked from an assert, even if that function actually has no side-effect.

Do not use goto statements

See this classic paper for some ammunition to enable this warning. Read Donald Knuth, Structured Programming with goto Statements, Computing Surveys 6 (4), pp. 261-301, December 1974 to argue for disabling this warning.

Avoid casts from unsigned to signed variables

Example:

long long negate(unsigned c) { return -c; }

This will produce the following complaint for ILP32 targets:

Expression '(long long) -c' is cast into signed 64 bits from unsigned 32 bits

This complaint is issued in situations where the official definition of casting differs from the expectations of most programmers. In the above example suppose that the variable c has the value 1. Most people would expect a return value of -1, but the function will in fact return 4294967295.

The reason is as follows. Since c has type unsigned int, so does -c. Because of the wrap-around semantics of unsigned arithmetic the value of -c on a 32 bit machine is 4294967295. That number is then cast into a 64 bit long long as is.

The recommended fix is:

return -(long long)c;

This complaint is issued conservatively in situations where we expect human expectations to differ from reality. For example, the complaint would not be issued in case of

long long whatever(unsigned c) { return c; }

In general the complaint is issued only when casting the result of an arithmetic operation.

Avoid unused functions

The static function foo is declared but never called, which is sometimes an indication of forgotten functionality.

Because the function is static, no one outside of this file can call it. This could be the symptom of a typo (perhaps it was meant to be called from within this file). This could also be a function that was meant to be used, but never was, or a function that used to be used, but no longer is.

Avoid using incompatible defined types in assignments, comparisons, initializations or function calls

Consider the following motivational example:

typedef float time; typedef float length; time T; // in seconds length L; // in meters if (T == L) .... // complaint

C/C++ Data Flow Analysis complains here, because you're comparing a length against something that is considered 'time'. That does not seem sensible. You get the idea.

Note, that the example compiles just fine. The reason is that time and length are not different types. They are only synonyms for float and therefore compilers keep quiet about it. C/C++ Data Flow Analysis however treats time and length as different types, intentionally, because that is sometimes what was intended with the typedef.

The complaint can occur in comparisons, assignments, initializations (which includes passing arguments to functions). It involves two types and both types must either be enum types or defined via typedef.

Always surround if-then-else clauses and loop bodies with braces

Example:

if (a) a = 0;

Some coding guidelines disallow this form and require that the then clause be surrounded by braces, as in

if (a) { a = 0; }

This is to prevent a common mistake of the form

if (a) a = 0; b = 0;

Here the programmer probably meant to make the added assignment b = 0 to be conditional, but it is unconditional.

Same rules apply to else clauses and loop bodies.

Some coding guidelines do not require the braces if the clause is on the same line as the condition, as in

if (a) a = 0;

The reason is that this form will not lead to the above mistake.

Avoid accessing the same variable through two parameters

Each of the two given parameters is either a pointer or a reference, and both are given the address of the same variable. This can have unpredictable results because the variable is accessed under two separate parameters inside the function foo.

Example:

scanf("%d %d", &x, &x);

This will yield the complaint

"foo.C", line 5: Function 'scanf' accesses the same variable 'x' through two parameters #2 and #3

The complaint will be issued for all functions like scanf plus the two functions foo, bar, and ns::baz.

The complaint will be issued for all functions except printf, sprintf, etc. However, the single function whose signature is clas::my_printf(const char *, ...) will also be considered for complaints, since it is listed in the functions_by_signature list. Even though my_printf is excluded by the pattern, a function only has to match one of the parameters (either the patterns or the list of signatures) to be included in the complaints.

Avoid reassigning a variable the same value

The variable xyz was already assigned this value previously in the code. This could be caused by misspelling the variable name in one of the assignments, assigning the wrong value, or it could be intentional. Look at the previous assignment to xyz and determine if the current assignment is necessary. It may need to be changed because it is incorrect, or it may be unnecessary all together.

Avoid using compiler-generated copy constructors unintentionally

Example:

class widget { /* some class definition without copy constructor or assignment operator */ }; ... widget new_w = new widget(old_w);

The above code creates a new instance new_w initialized to a copy of old_w. Given that the class widget has no copy constructor, a compiler-generated copy constructor will be invoked, which copies each member. That may be unintended and can cause serious problems, for example, if the class has a destructor freeing memory created by a constructor.

For the above code you would get the complaint

"foo.C", line 42: widget 'old_w' is being copied with a compiler-generated copy constructor

Avoid comparisons using strcmp when the arguments are different

Since strcmp returns 0 (false) if the arguments are the same, the expected result is sometimes incorrect.

Example:

if(strcmp(a,b))

This if statement may appear to check if the arguments are the same, but it will only succeed if they are different. To avoid this misunderstanding a coding guideline may require that the if-statement be written as:

if(strcmp(a,b) == 0)

which succeeds when the arguments are the same, or

if(strcmp(a,b) != 0)

which succeeds when they are different.

One of these options should be chosen and used, depending on whether this if statement was meant to check equality or inequality.

Suppose you have your own function foo similar to strcmp and you want to be warned about testing its value without explicit comparison.

Avoid reusing variable names within the same scope

This rule warns you of variable or parameter declarations that hide (or shadow) other variables or parameters in enclosing scopes. It also warns you of class field declarations and class method declarations that hide (or shadow) class fields and methods in base classes.

Example:

int x; int foo(int x) { return x + 1; }

The parameter x of function foo shadows the global variable x.

class Parent { public: int x; }; class Child : public Parent { public: int x; };

The field Child::x shadows the base class field Parent::x.

This check will catch the shadowing of global variables, function parameters, and local variables interchangeably. It also catches the shadowing of class fields and class methods when defining derived classes.

Note that a derived class method is not considered to shadow a base class method if any of the following are true:

The base class method is overridden by some other method in the derived class. The fact that it was overridden by some other method implies that the author of the class knows of the method and is simply defining an overloaded version of it here.
The base class method is brought into the scope of the derived class via a using-declaration (like using Base::foo). The fact that it was referenced this way implies that the author of the class knows of the method and is simply defining an overloaded version of it here.

Using the result of function as a boolean value may be inefficient

This rule warns you if you call a function and use the result as a boolean value (by comparing it equal or not equal to zero, or by using it in a boolean context) because there may be a more efficient alternative.

This may be useful for certain functions like strlen. Comparing the first character to the ASCII '\0' character may be more efficient than comparing the length of the string to zero.

Example:

int foo(char *x) { if (strlen(x)) { return 1; } return 0; }

Depending on your compiler, this may be inefficient. The following can be used if efficiency is more important than clarity:

int foo(char *x) { if (x[0] != '\0') { return 1; } return 0; }

Portability checks

These checks will complain about problems that will arise when moving code from one machine width (like a 32-bit) to another machine width (like a 64-bit machine).

The "portability target machine"

The description of the source machine for which the code is built on currently (the "build machine") is taken directly from the compiler configuration file that is being used during the C/C++ Data Flow Analysis run. This file describes the size of int, long, pointers, etc.

Avoid casts of '(int) ptr' that will cause truncation on the target machine

For example, on a 64 bit machine pointers are 64 bits, while an int is only 32 bits. Some casts that work in 32 bit mode will behave differently in 64 bit mode.

Example:

int* ptr; ... int num = (int) ptr;

In this example, the cast will work fine in 32 bit mode, but will truncate the value of ptr in 64 bit mode.

This complaint is issued when a pointer of a certain size on the build machine is cast into a numeric type that is large enough on the build machine, but would be too small on the portability target machine.

If casting must be done to store pointers in integer types, use a type large enough on all targets (usually unsigned long).

A dual complaint is Avoid casts of '(ptr) int' that may behave differently on the target machine.

Avoid casts of '(int) long' that will cause truncation on the target machine

For example, on a 64 bit machine long is 64 bits, while int is only 32 bits. Some casts that work in 32 bit mode will behave differently in 64 bit mode.

Examples:

long num1; ... int num2 = (int) num1; // Explicit cast from long to int ... num2 += num1; // Implicit conversion from long to int

In these examples, the conversion will work fine in 32 bit mode, but will truncate the value of num1 or num1 + num2 in 64 bit mode.

This complaint is issued when a variable of numeric type of a certain size on the build machine is cast into a numeric type that is large enough on the build machine, but would be too small on the portability target machine.

If casting must be done between integer types, use a type that is large enough on all targets, or beware of truncation on some.

Avoid casts of '(ptr) int' that may behave differently on the target machine

For example, on a 64 bit machine pointers are 64 bits, while int is only 32 bits. Some casts that work in 32 bit mode will behave differently in 64 bit mode.

Example:

int num; ... int* ptr = (int*)num;

In this example, the cast will work fine in 32 bit mode, but will pad the result in 64 bit mode. Since the result is a pointer, and the source type can not hold a pointer in 64 bit mode, this is almost always incorrect.

This complaint is issued when a variable of numeric type of a certain size on the build machine is cast into a pointer type, where the numeric type is large enough on the build machine to represent all pointer values, but would be too small on the portability target machine.

If casting must be done between pointers and integer types, use a type that is large enough on all platforms (usually unsigned long).

A dual complaint is Avoid casts of '(int) ptr' that will cause truncation on the target machine.

Avoid casts of '(long) ptr' that may cause irreproducible behavior

Casting a pointer to a numeric type can cause unpredictable behavior, depending on where the address exists in memory (if that location cannot be determined).

If possible, casting pointers to numeric types should be avoided. The complaint is issued whenever a pointer type is cast into a numeric type.

Avoid casts of '(long *) ptr_to_int' that may behave differently on the target machine

For example, some casts that work in 32 bit mode will behave differently in 64 bit mode.

Example:

int i; /* 32 bits */ long *l = (long *) &i; /* l thinks it has a pointer to 64 bits */ *l = 5; /* 64 bits starting with the address of i are being overwritten */

In this example, the cast will work fine in 32 bit mode, but memory will be overwritten in 64 bit mode.

This complaint is issued when a pointer to a type of a certain size on the build machine is cast to a pointer to a type that is the same size on the build machine, but would be a different size on the portability target machine.

If casting must be done between numeric types, ensure that the destination type is never larger than the source type, or memory may be corrupted by assignments into the pointer, and invalid memory may be read by dereferencing the pointer.

String 'pat' is used as a pattern for 'printf'

Example:

printf(pat);

will generate the complaint

String 'pat' is used as a pattern for 'printf'

If the string pat contains any % specifications, then printf will access whatever happens to be on the stack, which could allow violation of security. It is safer to code the example like this

printf("%s", pat);

The function printf is not the only one whose format will be checked. C/C++ Data Flow Analysis will check any function's arguments for insecure format, provided that function is given a format attribute.

Avoid passing untrusted input to argument

Example:

void complain(char *s) { strcpy(buff, s); }

will generate the complaint

Passing untrusted input to argument #2 of 'strcpy'

If the string s is longer than buff then some memory will be overwritten. This can be used by an attacker to make the machine execute his code.

This complaint is related to Do not access an array beyond its bounds, which also checks for buffer overruns. The difference lies in the amount of evidence necessary to issue the compliant. For Avoid passing untrusted input to argument sufficient evidence is the mere absence of any checking of length of s. That is, Avoid passing untrusted input to argument should be turned on in code where any input string is assumed to be too long unless programmer checked it and determined that it is not too long. In contrast, for Do not access an array beyond its bounds the burden of proof is on C/C++ Data Flow Analysis to find evidence that the input string is indeed too long.

Enabling C/C++ data flow analysis
- C/C++ data flow analysis requires that your Eclipse project be configured so that include files can be found and resolved.
Specifying C/C++ data flow analysis custom definitions
- To specify custom definitions for C/C++ data flow analysis, open the Preferences dialog box. In it, select Software Analyzer > Data Flow Analysis > C/C++. In the resulting preference page, enter custom definitions in the C/C++ Definitions field. Separate each entry with a carriage feed/line return (new line).

-courtesy This article is belongs to IBM

Feedback