In this chapter we discuss a collection of issues having to do with using CCured on real programs. Some of the issues are related to sound handling of dark corners of the C programming language (e.g. function pointers, initialization of globals, variable argument functions). Other issues are related to mechanisms to give hints to the CCured inferencer with the ultimate goal of reducing the number of cases in which the inferencer gives up and decided to use the conservative but expensive WILD pointers (e.g. polymorphism, custom memory allocators).
One of the signs that a C program is a “serious” one is the use of function pointers. There would be nothing wrong or unsafe about that if it wasn't also the case that most programmers do not feel necessary to use accurate types for function pointers, or to even use function prototypes. This is probably due to the fact that the syntax for function types in C is terrible. How often have you declared your function pointers to have type void (*)() when you actually wanted to say int * (* (* x3))(int x)(float) (a pointer to a function that takes an int and returns a pointer to a function that takes a float and returns a pointer to an int).
Of course, misusing function pointers can lead to the worst kind of errors. Fortunately such error rarely go unnoticed in code that is executed.
CCured supports two kinds of function pointers. The SAFE function pointers can only be invoked with the same number of arguments. If the types of the arguments are not right it is the argument that becomes WILD not the function pointer. A SAFE function pointer can only be cast to an integer or to the same function pointer type. We also have WILD function pointers which you can (try to) use as you please. In fact a WILD function pointer can be cast to any other WILD pointer type and can be stored in any tagged area. For this reason its representation must match that of any WILD pointer. However the capabilities of a WILD function pointer are typically quite different from those of a regular function pointer. For example, you should not be able to read or write from a function pointer.
The next picture shows the meaning of the _b field for a WILD function pointer.
Any function whose address is taken and becomes WILD, or that is used without a prototype (see the discussion at the end of this section) is a tagged function and has an associated descriptor that encodes the actual code to the function and the number of arguments. Here is an example:
int taggedfun(int anint, int * aptr) {
return anint + * aptr;
}
int main() {
int * i = taggedfun; // Bad cast. wildfun becomes tagged
// Now we invoke it
((void (*)(int,int*))i)(5, i);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
The structure of a function descriptor is as shown below and a pointer to the _pfun field is used as the _b field whenever the address of the function is taken.
struct __functionDescriptor {
unsigned int _len ; // Always 0
void (* _pfun)() ; // Pointer to a function
unsigned int _nrargs ; // The number of arguments
};
Since the _len field is always initialized to zero, whenever this WILD pointer is used for a read or a write it would appear that it points into a zero-length tagged memory area, so the bounds check will fail. We then have to protect against the pointer being subject to arithmetic prior to invocation. We do this by storing in the function descriptor the actual pointer to the function and checking at the time of a call through a WILD function pointer that the _p field of the pointer is equal to the _pfun field in the descriptor.
Finally we have to ensure that the function is called with the right number and kinds of arguments. There is no hope to be able to ensure this statically because a WILD function pointer can be used very liberally as any other WILD pointer. So, CCured conservatively forces all arguments and the return type to be WILD pointers. This includes arguments and return types that are actually scalars (see the example above for how integers are wrapped into WILD pointers). This will ensure that the types are the same (or compatible) and all we have to check is the right number of arguments is passed to the function. To perform these checks we use the following run-time support function:
/* Check that a function pointer points to a valid tagged function and check
that we are passing enough arguments. We allow the passing of more
arguments than the function actually expects */
__CHECK_FUNCTIONPOINTER(void *_p, /* The _p field of the function pointer */
void *_b, /* The _b field */
int nrActualArgs); /* The number of actual arguments */
Also, always use prototypes for the external functions you are using. Otherwise, it will appear to CCured that you are casting the function pointer to various incompatible types corresponding to each use and the function will be declared tagged (and pointers to such function to be WILD). You get some help from CCured here because its whole-program merger will construct prototypes for the functions that are defined somewhere in your program. But when you use even simple things like printf you must include the proper header files.
The main function is the entry point to your program. The most general type of the function is:
int main(int argc, char **argv, char **envp);
although when the arguments are not used it is common to not write them. Depending on how you use the argv and envp arguments, CCured might decide that they should be of some non-SAFE type. In that case CCured will generate code that makes copies of appropriate kind of the argc and envp arguments.
Take a look at what happens for this example:
int main(int argc, char **argv) {
for(; *argv; argv ++) { // Scan the args
char *p = *argv;
while( *p) { p ++; }
}
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
CCured will also insert a call to ccuredInit, which initializes the CCured run-time library.
C has a very rich language of initializers for globals and locals. The language is so rich that neither gcc nor MSVC implement it fully. For a discussion of how our front-end handles initialization, please see the the CIL documentation.
Once programs are presented to CCured all the initialization for locals is turned into assignments, but most initialization code for globals is preserved. However, in some cases CCured must insert some checks related to the initializers. These checks are placed in a special function called a global initializer.
The name of a global initializer starts with __globinit. CCured will try to insert a call to the global initializer that it creates in the main function to ensure that it is run before anything else in the program. If it cannot find a main it will emit a warning:
Warning: Cannot find main to add global initializer __globinit_myfile
If you see such warnings and intend to actually run the code, make sure whoever invokes any function in the cured code calls the global initializer first.
In CCured it is Ok to cast any kind of pointer to an integer, and in fact any pointer comparison is performed after such a cast. But if you try to cast an integer to a pointer the following two things happen:
This means that such pointers cannot be used in memory dereferences. If your program casts a pointer into an integer and then back to a pointer this will be an issue. CCured will emit a warning whenever this happens. So far we have very few programs that do this and even then in one of few forms.
Some programs are just not careful about keeping pointers separate from integers and gratuitously cast to integers. The solution in that case is to change the type of the intermediate location to a void* (or to a more precise type of pointer if possible).
Other programs cast pointers to integers because they want to do pointer arithmetic and do not have to worry about the implicit scaling that C uses for pointer arithmetic. Use char* to do such arithmetic.
Some other programs also want to do arithmetic but of a kind not allowed for char* such as the following code which tries to align a pointer to a 16 byte boundary:
int* alignit(int *x ) {
return (int*)((int)x & ~15);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
The solution here is a cute trick that can be used in some situations to cast an integer into a pointer, provided you know that it has the same metadata with some other legal pointer. Thus to cast the integer x to a pointer, while borrowing the metadata from pointer pdo:
p + (x - (int)p);
Thus you are turning a cast into pointer arithmetic. CCured will force the kind of the pointer to be either WILD or SEQ but everything will work as expected. Of course you have to worry about scaling back the difference by the size of the type pointed to by x. Here is how the previous alignit function can be written:
int* alignit(int *x ) {
int ix = (int)x & ~15;
return x + ((ix - (int)x) >> 2);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
If you are lazy and do not want to change your code you can ask CCured to insert code that at every cast from a scalar to a pointer records the line-number of the cast. Then when a non-pointer is dereference the CCured run-time system will try to tell you which particular cast in your program produced this fake pointer. Use the interceptCasts pragma for this purpose (see Section 9.10). We have not found this feature very useful because in fact not too many programs cast integers to pointers.
It is obvious by now that CCured will change the layout of some datatypes. That can lead to several kinds of problems. For example, if you are calling a library that is not cured then you better not change the layout of the data that is passed back and forth. This issue is discussed more in Chapter 8. Another problem is when the code is written assuming that datatypes have a certain layout, such as the following code that accesses the
#include <stdio.h>
int *a[8];
void bad_code(int * *x) {
int * pa = a; // Make a's elements WILD
printf("a has %d elements", sizeof(a) / 4);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
This code will probably print 16 instead of 8 because each element of a is now 8 bytes long. Such code is very ill and it cannot be cured without manual intervention. So, let's assume that we change the code to:
#include <stdio.h>
int *a[8];
void so_and_so_code(int * *x) {
int * pa = a; // Make a's elements WILD
printf("a has %d elements", sizeof(a) / sizeof(int *));
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
As it turns out this code is perfectly fine but our inferencer cannot tell that there is a connection between the type of the element of a and the int * that appears in the argument of sizeof. Even though the bad cast will force the array elements to be WILD pointers the int * that appears in the argument of sizeof will be a SAFE pointer. Thus this code will also print 16. In fact, you will see a warning:
pathexec_env.c:42: Warning: Encountered sizeof(int */* __attribute__((___ptrnode__(2595))) */) when type contains pointers. Use sizeof expression. Type has a disconnected node.
If CCured says that the type has a connected node, then you are probably Ok. It means that the node inside sizeof is connected to the other nodes, so it will probably get the right kind. However, if CCured says that the type has disconnected nodes then you should worry.
To really point out the connection change the code as shown below. It can be argued also that this code is clearer and thus should be used even if you do not use CCured.
#include <stdio.h>
int *a[8];
void good_code(int * *x) {
int * pa = a; // Make a's elements WILD
printf("a has %d elements", sizeof(a) / sizeof(a[0]));
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
Similar problems arise for any use of sizeof, such as in the argument of allocation functions.
Variable-argument functions in C are inherently unsafe since there is no language-level mechanism to ensure that the actual arguments agree in type and number with the arguments that the function will be using. There are several ways to implement variable argument functions in C and CCured supports some of them quite well:
There are two kinds of variable-argument functions in C:
int printf(const char* format, ...)
int vprintf(const char* format, va_list args)
CCured supports both kinds of functions and will scan the program to find out for each function what types of arguments are passed. In Section 9.6.1 we describe how the programmer can prevent this automatic inference by specifying the set of types of arguments.
CCured redefines the macros in <stdarg.h> and <vararg.h> to do special bookkeeping. In vararg functions, the macro va_start is used to initialize an va_list variable to point to the trailing arguments. CCured checks that the second argument is the last formal before the ....
Both in vararg and valist functions the macro va_arg can be used, as follows:
T x = va_arg(args, T)
args must be a va_list variable and T must be compatible after the usual actual argument promotions (e.g. char and short to int and float to double) with one of the types in the struct associated with args. CCured checks this at run-time.
The CCured support for variable argument functions is quite flexible. Multiple variable argument lists can be processed in parallel, an argument list can be re-initialized with va_start and processed multiple times. A function can even work with variable argument lists that have different sets of types accepted (but for this you need to specify manually the set of types of arguments as explained in Section 9.6.1). Variable argument lists can be passed down but the regular CCured checks for stack allocated variables will prevent the passing of these lists up the call chain and also their storing in the heap.
The main thing that is not supported in CCured is the fetching of an argument with a different type than it was stored. It remains to be seen if this is a problem. We have looked at several variable argument functions (including full implementations of printf and sprintf) and so far we have found that CCured accepts those functions without any change except for the specification of the struct of the accepted argument types (as explained below).
If you do not want CCured to find automatically all the types that can be passed to a function, you can specify the set of types that can be used for arguments. Also, you should not let CCured infer the argument types for printf-like functions, but you should instead use the special support for them, as explained in Section 9.6.2.
You can declare the argument types by declaring a descriptor. This is a struct data type whose fields have the types that can be passed to the function. The order and the names of the fields do not matter. For example, such a struct for printf would be the following (this structure is defined in ccured.h):
struct printf_arguments {
int f_int;
double f_double;
char *f_string;
};
The simplest way to specify that such a struct describes the types of arguments for a variable argument function is to use a pragma:
#pragma ccuredvararg("myvarargfunction", sizeof(struct printf_arguments))
Notes:
An equivalent method is to associate the __CCUREDVARARG(struct printf_arguments) attribute with the type of the function myvarargfunction:
int (__CCUREDVARARG(struct printf_arguments) myvarargfunction)(int last, ...);
You have to use this method if you want to specify that a function pointer is variable argument:
int (__CCUREDVARARG(struct printf_arguments) * myvarargptr)(int last, ...); typedef int (_CCUREDVARARG(struct printf_arguments) fptr)(char *format,...);
A more fine-grained way to specify the same thing is to use the __CCUREDVARARG type attributes for va_list every time it appears. This allows you to specify different sets of types for different locals:
va_list __CCUREDVARARG(struct printf_arguments) args1,
__CCUREDVARARG(struct some_other_type) args2;
Since the vast majority of uses of variable argument functions if for printf-like functions, CCured contains special support for them. Specifically if a vararg function is declared to be a printf-like function then all of its invocations in which the format string is a constant will be checked statically. For the other invocations a wrapper for printf will be called that will check the types of the actuals against the format string before calling the real printf function.
To declare a function to be printf-like use the following pragma:
#pragma ccuredvararg("myprintf", printf(1))
where the last argument is the index of the format argument in the argument list (starting with 1). Note that you will get a run-time error if you try to use the va_arg macro in the implementation of such a function. In those implementations you should invoke functions like vprintf and vsprintf instead.
GCC already has support for communicating to the compiler that a function is printf-like. This is done as follows:
int myprintf(const char* format, ...) __attribute__((format(printf, 1, 2)))
where the “1” means that the first argument is the format string and the “2” means that we should start checking with the second argument. CCured recognizes this attribute and it considers it equivalent with the ccuredvararg from above. Note that the second argument in the format attribute is ignored in CCured.
You can use the format attribute even for function pointers:
int (__attribute__((format(printf, 1, 2))) *myptr)(char *format, ...);
Note that CCured does not currently like passing pointers to printf with the intention of printing the pointer value. You should manually cast those pointers to long when passing them to printf-like functions.
Also, you should not let CCured infer automatically the descriptors for printf-like functions. Otherwise, it is quite likely that the descriptor that will be inferred is different than the built-in descriptor printf_arguments (which the runtime library is using to check the calls to printf-like functions. CCured will warn you about all automatically inferred descriptors and you should manually inspect all the functions involved.
As for the regular variable argument functions, the pragma works only for named functions but not for pointers to functions. For that purpose you must use attributes:
int (__CCUREDFORMAT(1) * myprintf)(char *format, ...); typedef int (_CCUREDFORMAT(1) fptr)(char *format,...);
Since it proved too much trouble to handle scanf-like functions in a safe yet transparent way we currently require the programmer to rewrite the invocations to scanf using a number of functions that we provide. For example instead of
int entry; double then; char buffer[6];
... fscanf(file, "Entry:%d; Then:%lf; 5 digits:%5[0-9]; useless text.",
&entry, &then, buffer) ...
you should write
... (resetScanfCount(),
entry = ccured_fscanf_int(file, "Entry:%d"),
then = ccured_fscanf_double(file, "; Then:%lf"),
ccured_fscanf_string(file, "; 5 digits:%5[0-9]", buffer),
ccured_fscanf_nothing(file, "; useless text."), //advance the file pointer.
getScanfCount ()) ...
The functions resetScanfCount and getScanfCount are necessary only if you use the result of the call to fscanf in the original code. Note that our replacement scanf functions can be used to return only one result at a time, consequently the format string that is passed must contain only one format specifier, possibly along with characters to be matched.
The following are the scanf-like functions that we currently support:
extern int ccured_fscanf_int(FILE *, char *format); extern double ccured_fscanf_double(FILE *, char *format); extern void ccured_fscanf_string(FILE *, char *format, char *string); extern void ccured_fscanf_nothing(FILE *, char *format);
If the original program uses scanf, just consider that you are using fscanf from stdin. If instead your program contains sscanf then you can use the function
void resetSScanfCount(char *string);
to dump the string to the temporary file ccured_sscanf_file then use the replacement for fscanf from above. For example,
... (resetSScanfCount(inputString),
entry = ccured_fscanf_int(ccured_sscanf_file, "Entry:%d"),
then = ccured_fscanf_double(ccured_sscanf_file, "; Then:%lf"),
ccured_fscanf_string(ccured_sscanf_file, "; 5 digits:%5[0-9]", buffer),
getScanfCount ()) ... //getScanfCount is required when using resetSScanfCount
Note that the current support for scanf is far from satisfactory and will likely change in the future
Almost all of the checking for variable-argument functions is done at run-time. At the time of a call each actual argument is compared with the types in the struct associated with the vararg function. A global data structure is filled with the number of arguments (in the global __ccured_va_count and a list of indices describing for each actual argument the index within the struct types (in __ccured_va_tags).
In the body of a vararg function, a data structure is allocated on the stack to hold a copy of the global description of the arguments that was created by the caller. The call to va_start initializes this data structure and each call to va_arg checks that we are not reading past the end of the actuals and also that the type of the fetched argument matches that of the actual argument.
As we have seen in Section 3.2.2 CCured can handle union types whose fields have compatible pointer types at corresponding offsets. If this is not the case then you will need to tell CCured how to handle the union. One option is to turn the union into a struct, but we do not recommend this because it increases memory usage and can change the behavior of your program if your code writes to one union field and then reads from a different one. A better option is to declare that the union is a tagged union. CCured actually supports two forms of tagged unions: one in which CCured adds a tag field and maintains it for you, and one in which your program maintains its own tag, and CCured checks that it is used properly.
You can declare a union to be tagged by adding the attribute __TAGGED to its definition. CCured will expand the union to contain a tag field. A tag is an RTTI value (Section 7.3) that encodes the type of the last field written in each union value. Here is an example:
union int_or_ptr {
int i;
int *p;
} __TAGGED; // We declare it tagged
int main() {
union int_or_ptr x;
int i;
x.i = 5;
i = x.i; // This will work
i = * x.p; // This will fail
x.p = &i;
i = x.i; // This will fail
i = * x.p; // This will work
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
You can see that CCured has defined the following structure:
struct tagged_int_or_ptr {
struct RTTI_ELEMENT * __tag ;
union int_or_ptr __data ;
} __TAGGED ;
You can also see in the code that CCured generates assignments to the __tag field before each assignment to a union field. And CCured inserts calls to CHECK_UNIONTAG before each read-access to the field.
Notes:
typedef struct foo {
int *f1;
} Foo;
union ptrs {
void * v;
Foo * f;
int * i;
} __TAGGED; // We declare it tagged
int main() {
union ptrs u1, u2;
Foo f;
//We write to the "void*" field, and read from the "Foo*" field.
//This works because rtti_ptr has Run-Time Typing Information.
void* __RTTI rtti_ptr = &f;
u1.v = rtti_ptr;
Foo* pf = u1.f; // This will work
int* pi = u1.i; // This will fail
//We can also write to the Foo* field, and read from the void* field a
// pointer that has RTTI info.
u2.f = &f; //write to the "Foo*" field
rtti_ptr = u2.v; // and read from the "void*" field
pf = (Foo*)rtti_ptr; //This checked downcast will succeed.
return 0;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
Many programs define their own tagged unions, in which a struct contains a tag field and a union holding one of several types of data. In these cases, using a CCured-supplied tag is redundant. You can annotate a union to tell CCured what meaning you assign to various tag values, and CCured will then check that the tags are maintained properly.
When a tag is modified, the “data” portion of the structure will be zeroed. Therefore, when writing both the tag and data portions, programs must always modify the tag first, followed by the data.
When a program reads or writes the data part of a tagged union, CCured will read the tag and check that it is appropriate for the union field being accessed.
Tags are defined by annotating each union field with __SELECTEDWHEN(exp) where exp is a boolean expression. exp may contain integer arithmetic and comparisons, and it can refer to the runtime value of a field in an enclosing struct by specifying the name of that field. For example:
enum tags {
TAG_ZERO = 0,
};
struct host {
short tag; // 0 for integer, 1 for structure, 10--12 for pointer to int
union bar {
int anint __SELECTEDWHEN(tag == TAG_ZERO);
struct str {
int * * ptrptr;
float f;
} structure __SELECTEDWHEN(tag == 1);
int * ptrint __SELECTEDWHEN(tag >= 10 && tag <= 12);
} data;
} g;
int x;
int main() {
g.tag = 12; //Select g.data.ptrint
g.data.ptrint = &x;
int* px = g.data.ptrint; //To check that it's okay to access g.data.ptrint,
//CCured checks "g.tag >= 10 && g.tag <= 12"
return 0;
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
In this case, the __SELECTEDWHEN attributes tell CCured that the field data.anint is active when the tag field is 0, the field data.structure is active when the tag field is 1, and the field data.ptrint is active when the tag field is between 10 and 12.
Notes:
Programmers can annotate array pointers with length attributes. CCured will then use the annotated length whenever it needs to do a bounds check on that pointer, instead of transforming the pointer into a fat pointer. This has two advantages:
Length annotations are allowed in two situations: struct fields may have length annotations that depend on the values of other fields in that struct, and function parameters may have lengths that depend on other parameters in that function. (NB: but the annotations on function parameters are not yet implemented. Coming soon ...)
Only pointer types may be annotated. The annotation __SIZE(exp) on a field means that the associated pointer is exp bytes long, where the expression exp can involve integer constants, arithmetic, sizeof, and the names of other fields in the same struct. So __SIZE(1 + foo) means that the specified pointer has a length that's one greater than the runtime value of field foo in the same object.
__COUNT(exp) means that the pointer is exp elements long. So when annotating a pointer with type T*, the annotation __COUNT(exp) is equivalent to __SIZE(exp * sizeof(T)).
Any field that is referred to by a __SIZE or __COUNT annotation is a metadata field. When a metadata field is modified, any pointer fields that depend on it are set to NULL. Therefore, when writing both the metadata and pointer fields, programs must always modify the metadata first, followed by the pointer.
When an annotated pointer field is read, CCured will read any metadata fields as well, and associate that length with the pointer. When a pointer field is written, CCured will check that the buffer's length is less than or equal to the length specified by the current value of the metadata fields.
extern void* malloc(int);
#pragma ccuredalloc("malloc", sizein(1), nozero)
struct bar {
int nrInts;
int *ints __COUNT(nrInts);
};
struct foo {
int sizeBars;
struct bar * bars __SIZE(sizeBars);
};
// Now the function that uses it
void init(struct foo* pFoo) {
int nrBars = 5;
pFoo->sizeBars = nrBars * sizeof(* pFoo->bars);
pFoo->bars = (struct bar*)malloc(pFoo->sizeBars);
}
Browse the CCured inferred pointer kinds, or see the CCured output for this code fragment
In this code, we first overwrite the field pFoo->sizeBars, which automatically sets the field pFoo->bars to NULL. The next step is to write a new pointer to the pFoo->bars field. During this write, CCured will check that the pointer being written (in this case, the result of malloc) is at least “pFoo->sizeBars” bytes long.
The high order bit: we use the Boehm-Weiser garbage collector.
TODO : finish this section
The following pragmas are recognized by CCured. Note that pragmas can only appear in between global declarations. Some of them are discussed in more detail in following sections: