CHAPTER 8 The C preprocessor Reference: Kelly & Pohl, Chapter 8 Additional reference Peter A. Darnell, Philip E. Margolis, Software Engineering in C, Springer-Verlag, New York. The compiling process -- overview C Program -- foo.c % cc -c foo.c cpp -- C preprocessor Handles #-directives; removes comments foo.E ccom -- C Compiler compile program C Optimizer (optional) foo.s as -- assembler foo.o Copyright (c) 1999 by Robert C. Carden IV, Ph.D. 3/9/2016 The C preprocessor Compiling process -- translation phases (ANSI) 1. Physical source file characters are mapped to the source character set (including new-line characters and end-of-file indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations. 2. Each instance of a new-line character and an immediately preceding backslash character (\) is deleted, splicing physical source lines to form logical source lines. 3. The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or comment. Each comment is replaced by one space character. New-line characters are retained. 4. Preprocessing directives are executed and macro invocations are expanded. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4 recursively. 5. Each source character set member and escape sequence in character constants and string literals is converted to a member of the execution character set. 6. Adjacent character string literal tokens are concatenated and adjacent wide string literal tokens are concatenated. 7. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated. 8. All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translation output is collected into a program image which contains information needed for execution in its execution environment. 8-2 The C preprocessor Compiling process (2) printf("Eh??/ ???/n"); Rule 1 -- Replace trigraphs: ??/ --> \ printf("Eh\ ?\n"); Rule 2 -- splice lines printf("Eh?\n"); Rules 3 and 4 have no effect Rule 5 -- convert \n to newline char Rule 6 -- no effect Rule 7 -- convert into tokens identifier printf ( string literal "Eh?\n" Compile token stream 8-3 ) ; The C preprocessor Definition of a preprocessing directive Any source line in a source file which begins with a # character (in column 1) is called a preprocessing directive File inclusion Any source line of the form #include "filename" or #include <filename> is replaced by the contents of the file filename If filename is quoted (first form), searching typically begins in the directory where the source program is located If it is not found there, or if the name is enclosed in chevrons (< and >), searching follows an implementation-defined rule to find the file If the compiler cannot find the file, the compilation process stops (error) An included file may itself contain #include lines Sample C source file Foo.h #define PI 3.14159 foo.c #include "foo.h" void foo() { printf("PI=%g\n", PI); } After file inclusion #define PI 3.14159 void foo() { printf("PI=%g\n", PI); } 8-4 The C preprocessor File inclusion (2) Including files within an include file foo.h #include <stdio.h> #include <stdlib.h> #include <string.h> #define PI 3.14159 foo.c #include "foo.h" void foo() { printf("PI=%g\n", PI); } After including file foo.h -- still more inclusion needed #include <stdio.h> #include <stdlib.h> #include <string.h> #define PI 3.14159 void foo() { printf("PI=%g\n", PI); } The line containing the #include "foo.h" directive is replaced with the contents of that file From there, the preprocessor reads through the file again, including any additional required includes In this example, it must also include the three standard library headers 8-5 The C preprocessor File inclusion under Unix When one does a #include <filename.h>, the compiler will look for the file filename.h where it expects system include files to be located Under Unix, this is usually in the directory /usr/include Some compilers will look in the current working directory (.) first followed by the system include directory With gcc (GNU's C compiler), it looks first in /.../gnu/lib/gcc-include followed by /usr/include The GNU compiler does not look in the current working directory (.) by default for system include files When one does a #include "ace/filename.h", the compiler will look for the file filename.h in the “ace” directory Modifying the default compiler search paths Most C compilers support the -I option This allows users to modify the search path for system include files and for user include files The syntax of the -I option is as follows: -I <directory> There may be more than one -I option on the cc command line These directories are prepended onto the default search path 8-6 The C preprocessor Example search path modifications % cc -I/user/john/include foo.c #include "foo.h" ./foo.h /user/john/include/foo.h #include <foo.h> /user/john/include/foo.h /usr/include/foo.h % cc -I. -I../include foo.c #include "foo.h" ./foo.h ../include/foo.h #include <foo.h> ./foo.h ../include/foo.h /usr/include/foo.h File inclusion -- comments One uses #includes to group common #define statements, extern declarations, and other shared data between files Among other things, they are used to declare common functions They are also used to access definitions for library functions from headers such as <stdio.h> on most systems, such files exist somewhere on a Unix system, they would exist in the directory /usr/include on a personal computer, they would probably exist in an INCLUDE directory somewhere within the compiler's environment however, strictly speaking, these need not be files this is indeed the case with the Unisys A-Series C compiler the compiler generates the code directly the #include serves more as a compiler directive the net effect is the same If you use a standard library function, always include its header 8-7 The C preprocessor Macro substitution A macro definition has the form #define name replacement-text Subsequent occurrences of the token name will be replaced by the replacement-text example A file with the lines: #define PI 3.14159 void foo() { printf("PI = %g\n", PI); } is transformed into the following file after the #defines are processed void foo() { printf("PI = %g\n", 3.14159); } Notice that the PI literal (token) was changed to the replacement text, but not within the string token 8-8 The C preprocessor Macro substitution -- illustration #define PI 3.14159 void foo() { printf("PI = %g\n", PI); } identifier comma identifier left-parenthesis right-parenthesis string literal semicolon void foo() { printf("PI = %g\n", 3.14159); } Phase 3 defines tokens Phase 4 replaces preprocessor token PI with 3.14159 PI identifier (name) replaced by replacement text 3.14159 String literal NOT touched 8-9 The C preprocessor Macro substitution (2) The replacement-text is all text to the end of the line If the line ends with a backslash (\), then the macro is continued onto the next line The scope of the macro is limited to the source file A definition may use any definition that is visible upon its invocation Macros, however, cannot be recursively defined #define PI P+I #define P 3 #define I .14159 void foo() { printf("PI = %g\n", PI ); } #define P 3 #define I .14159 void foo() { printf("PI = %g\n", P+I ); } void foo() { printf("PI = %g\n", 3+.14159 ); } void foo() { printf("PI = %g\n", 3.14159 ); } 8-10 The C preprocessor Macro substitution (3) Macros are expanded by literally substituting the macro identifier with the desired replacement-text This can lead to problems if the user is not careful example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 #define FOO #define FOOFOO 100 100*100 void foo (void) { float x, y, z, t, v; x = FOO; y = FOOFOO; z = 100 / FOOFOO; t = 100 / y; v = 1 / FOO; if (z == t) { printf("z is equal to t\n"); } else { printf("z is not equal to t\n"); } } What gets assigned to x, y, z, t, and v? What is the output of function foo? 8-11 The C preprocessor Potential bug -- ending a macro definition with a semicolon A common mistake is to place a semicolon at the end of a macro definition #define SIZE 10; The problem is that the semicolon becomes part of the replacement string Thus, the statement x = SIZE; expands to x = 10;; While that will compile and probably work fine, the following example will not compile int y = SIZE, z; A more pernicious example is where we write #define GOOD_CONDITION (var == 1); ... while GOOD_CONDITION foo(); This expands to while (var == 1); foo(); which is definitely not what we wanted 8-12 The C preprocessor Potential bug -- using = to define a macro Another common mistake people make when defining macros is to include the = sign as if one were initializing the variable Algol programmers should make a special note here Thus, instead of writing #define MAX 100 one mistakenly writes #define MAX = 100 // const int MAX = 100; The replacement text for MAX becomes "= 100" This can lead to some extremely obscure bugs For instance, suppose we write for (i = 0; I < MAX; i++) { ... } this would expand to for (i = 0; I <= 100; i++) { ... } Similarly, if we write for (j = MAX; j > 0; j--) { ... } this would expand to for (j == 100; j > 0; j--) {...} This last example changes the assignment expression into a relational expression, thus leaving j uninitialized Both of these examples will compile, but one would expect very unpredictable results 8-13 The C preprocessor Parameterized macros Macros in C may be parameterized The general syntax for a parameterized macro is as follows #define identifier(identifier, ..., identifier) replacement-text Now, along with the macro identifier, formal parameters are expected when using the macro Whenever a formal parameter is encountered in the replacement text, it is substituted just as if it were a macro itself example #define SQ(x) ((x)*(x)) void foo (void) { int a = 8, b, c, d; b = SQ (a); c = SQ (a + b); d = SQ (SQ (a)); } gets expanded to... void foo (void) { int a = 8, b, c, d; b = ((a)*(a)); c = ((a + b)*(a + b)); d = ((((a)*(a)))*(((a)*(a)))); } Our heavy usage of parentheses is to protect against the macro expanding an expression which would yield an unanticipated order of evaluation 8-14 The C preprocessor Parameterized macros (2) Consider rewriting the previous example as follows... #define SQ(x) x*x void foo() { int a = 8, b, c, d; b = SQ (a); c = SQ (a + b); d = SQ (SQ (a)); } This would get expanded to the following... void foo (void) { int a = 8, b, c, d; b = a*a; c = a + b*a + b; d = a*a*a*a; } The assignment to c is messed up because of operator precedence Now suppose we define SQ as follows: #define SQ(x) (x)*(x) Then, the expression 4 / SQ (2) expands to 4 / (2)*(2) which was not what we wanted 8-15 The C preprocessor Parameterized macros (3) Spaces are not allowed between the macro name and the left parenthesis #define SQ (x) ((x)*(x)) Calling this macro: y = SQ (7); expands to y = (x) ((x)*(x)) (7); Macros are often written to replace function calls by inline code The following is a macro to find the minimum of two values #define min(x,y) (((x)<(y))?(x):(y)) After this definition, an expression such as m = min (u,v); gets expanded to m = (((u)<(v))?(u):(v)); Notice that when we define a parameterized macro, we did not specify the type of its operands This is because macros do not care about the type of operands They simply expand; the resulting expression must be compatible Thus, our macro min may be used to find the minimum of integers or floats or pointer, or anything for which < is defined We may also use it as part of another macro definition If we wish to find the minimum of four values, we can write #define min4(a,b,c,d) min(min(a,b),min(c,d)) 8-16 The C preprocessor Parameterized macros (4) Macros are not function calls The arguments to a macro are evaluated each time they are referenced within the replacement text The following source code illustrates a potential pitfall #define max(x,y) ((x)>(y)?(x):(y)) void foo (void) { int i=1, j=2, x; x = max (I++, j++); } After the macros have been expanded, this becomes void foo (void) { int i=1, j=2, x; x = ((i++)>(j++)?(i++):(j++)); } This has the potentially undesirable effect of incrementing (in this case) i once and j twice Because the code is substituted inline, the macro argument is evaluated each time it is encountered This is similar to an Algol call-by-name parameter Because of this discrepancy, it is important to know when a particular operator is a macro as opposed to being an actual function Documentation will often specify that a given routine may either be implemented as a macro or function In this case, one should assume that it is a macro 8-17 The C preprocessor No type checking is done for macro arguments One advantage and possible disadvantage of a macro is that no type checking is done on the macro arguments This follows from the fact that the arguments are expanded inline Consider the following example #define DOUBLE_IT(x) ((x)+(x)) This may seem equivalent to the following function int double_it (int x) { return x+x; } However calling double_it(2.5) is not the same as calling DOUBLE_IT(2.5) One could rewrite this to be double double_it(double x) { return x+x; } But this would be much less efficient when calling double_it(2) because we would be using floating point arithmetic to do integer addition Using a macro allows us to utilize the most efficient implementation depending on the context of the operation 8-18 The C preprocessor ANSI feature -- Using a macro name in its own definition Most older C compilers do not allow this feature ANSI realizes that macros cannot be recursive and thus defines what it means for a macro to reference itself Consider the following example #include <math.h> #define sqrt(x) (((x) < 0) ? sqrt(-(x)) : sqrt(x)) An older compiler would fail because it would try to expand sqrt within the body of sqrt itself The ANSI compiler specifies that if a macro name appears within its own definition, it will not be expanded Thus, the invocation of y = sqrt(5); would expand to y = (((5) < 0) ? sqrt(-(5)) : sqrt(5)); In this example, the sqrt() function would be called with 5 as its argument 8-19 The C preprocessor The macros in <stdio.h> and <ctype.h> The C standard library includes two macros defined in <stdio.h> The first macro is used to get a character from standard input The second macro is used to write a character to standard output #define getchar() #define putchar(c) getc (stdin) putc ((c), stdout) Using the macro instead of calling the actual functions directly is just as efficient because the code is substituted in-line The only time lost (so to say) is the compile time, i.e. it takes a little longer to compile the file The header <ctype.h> contains macros to do character tests These macros should be assumed to take an argument of type int and return a value of type int The following table describes these macros macro isalpha(c) isupper(c) islower(c) isdigit(c) isalnum(c) isxdigit(c) isspace(c) ispunct(c) isprint(c) isgraph(c) iscntrl(c) isascii(c) c c c c c c c c c c c c is is is is is is is is is is is is nonzero (true) is returned if... a letter an uppercase letter a lowercase letter a digit a letter or digit a hexadecimal digit a white space character a punctuation character a printable character printable, but not a space a control character an ASCII code 8-20 The C preprocessor The macros in <stdio.h> and <ctype.h> (2) In some cases, the standard will specify that a given routine may either be a function or macro The following table gives functions from <ctype.h> which convert characters to other formats The standard specifies that these may either be functions or macros function or macro toupper(c) tolower(c) Toascii(c) effect changes c from lowercase to uppercase changes c from uppercase to lowercase changes c to ASCII code In older versions of C, toupper(c) and tolower(c) work only if c is lowercase or uppercase respectively The problem in these older versions is that the transformation would be applied to c even if it were not a lowercase or uppercase letter respectively To be safe, one should test to see if the letter being converted needs to be converted before converting it The following macros do just that... #define lowercase(c) (isupper(c)? tolower(c):(c)) #define uppercase(c) (islower(c)? toupper(c):(c)) 8-21 The C preprocessor Undefining macros and not calling them A macro may be deleted by undef-ing it #ifdef toupper #undef toupper #endif /* toupper */ This removes any previous definitions of the macro toupper It is not erroneous to apply #undef to an unknown identifier Putting parentheses around an identifier will prevent the parameterized macro from being invoked However, putting spaces between the macro identifier (upon invocation) and the left parenthesis will still cause the macro to be called 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 #include <ctype.h> int (tolower)(int c) { return (tolower (c)); } extern char buffer[]; extern int bufsize; void foo (void) { int i, c; for (i = 0; i < bufsize; i++) { c = buffer[i]; buffer[i] = toascii (c); } for (i = 0; i < bufsize; i++) { buffer[i] = (tolower)(buffer[i]); } } On line 12, the macro toascii is invoked even though there is a space between the macro identifier and its argument list this follows from the discussion on page 230 of K&R, 2nd edition the discussion in A Book on C is wrong on this point On line 15, the function tolower is invoked if no such function exists, then the program will not link 8-22 The C preprocessor Advantages of using macros as compared to functions 1. Macros are usually faster than functions since they avoid the function call overhead 2. The number of macro arguments is checked to match the definition. this is done for functions that use the ANSI prototyping syntax on older C systems, users may have to use traditional C syntax for functions thus, on older systems, prototypes may not be available 3. No type restriction is placed on arguments so that one macro may serve for several data types Disadvantages of using macros in place of functions 1. Macro arguments are re-evaluated at each mention in the macro body this can lead to unexpected behavior if an argument contains side effects 2. Function bodies are compiled once so that multiple calls to the same function can share the same code without repeating it each time. Macros, on the other hand, are expanded each time they appear in the program. a program with many large macros may be longer than a program that uses functions in place of macros if the macro is included in a header, then if one changes the macro, all files including the header must be recompiled to realize the change in the macro; if it were a function, only the file implementing the function need be recompiled 3. Though macros check the number of arguments, they don't check the argument types. ANSI function prototypes check both the number of arguments and the argument types. 4. It is more difficult to debug programs that contain macros because the source code goes through an additional layer of translation, making the object code even further removed from the source code. 8-23 The C preprocessor Conditional compilation Various constructs exist to control compilation in C The preprocessor has such directives for conditional compilation Each preprocessing directive is of the form #if #ifdef #ifndef Each of these provide for the conditional compilation of the code that follows until either the preprocessing directive #endif or #else or #elif The following properties must be true for the intervening code to be compiled directive #if #ifdef #ifndef constant_integer_expression identifier identifier requirement for code to be compiled constant_integer_expression must be true (nonzero) the named identifier must exist the named identifier must not exist The constant integral expression may not contain a sizeof operator or a cast this is because the code is evaluated by the preprocessor, not the C compiler It may use the special defined preprocessing directive Users should note that the defined operator may not exist in older versions of C 8-24 The C preprocessor Conditional compilation (2) #if Evaluates a constant integer expression (which may not include sizeof, casts, or enum constants) If this expression is non-zero, subsequent lines until an #endif, #elif, or #else are included Otherwise, those lines are converted into whitespace. Expression defined(name) in a #if is 1 if name has been defined, 0 otherwise #ifdef name Equivalent to #if defined(name) #ifndef name Equivalent to #if !defined(name) #else May be nested within a #if, #ifdef, or #ifndef construct Code between it and the #endif is compiled if and only if the preceding code was not. #elif Similar to #else but much like the else-if construct in C 8-25 The C preprocessor Application -- prevent multiple inclusion of header files In C, a lot of problems can occur if a header file is included more than once within a given source file bar.h #include <stdio.h> #include "foo.h" ... foo.c #include <stdio.h> #include "foo.h" #include "bar.h" ... In this example, foo.c includes files foo.h and <stdio.h> twice This is not necessarily a bad thing as long as the headers are written in such a way so as to allow for multiple inclusion We can add several lines to each of our header files... foo.h #ifndef FOO_H #define FOO_H bar.h #if !defined (BAR_H) #define BAR_H #include <stdio.h> ... #include <stdio.h> #include "foo.h" ... #endif /* BAR_H */ #endif /* FOO_H */ File foo.c includes foo.h followed by bar.h when foo.h is being compiled as the result of the first (direct) inclusion FOO_H is not defined the code surrounded by the #ifndef is compiled FOO_H is immediately defined when bar.h is being compiled BAR_H is not defined the code surrounded by the #ifndef is compiled BAR_H is immediately defined when foo.h is being compiled as the result of being included by bar.h FOO_H is defined the code surrounded by the #ifndef is not compiled 8-26 The C preprocessor Application -- isolate machine dependent code Sometimes it is necessary to do things which are inherently machine dependent Suppose, for example, we have an application which needs to know the page size of the operating system In general, one would expect to call an operating system function to return the page size However, if no such function exists, it is sufficient to make an educated guess 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #if defined (unix) /* use unix system call getpagesize(2) */ extern int getpagesize (void); #endif /* unix */ long pagesize (void) { #if defined (unix) /* predefined macro */ return getpagesize (); #else # if defined (__ASERIES__) # if defined (LONGLIMIT) return LONGLIMIT * 6; # else /* a pretty good guess */ return 0x1000 * 6; # endif /* LONGLIMIT */ # else /* heck, I don't know :: good enough */ return 1024; # endif /* __ASERIES__ */ #endif /* unix */ } 8-27 The C preprocessor Another approach to isolating machine dependent code Another common approach is to assume that the function getpagesize() exists and that if it doesn't on some system, then it will be implemented Typically, there will be a file getpagesize.c which implements it This is good for cases where a program has already been written to use getpagesize() and we just want to provide it in case a given system doesn't have such a function 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Getpagesize.c #include "allocate.h" #ifdef NOGETPAGESIZE int getpagesize (void) { #ifdef __ASERIES__ # ifdef LONGLIMIT /* LARGE MEMORY MODEL */ return LONGLIMIT * 6; # else /*an EWAG at an A-Series page size*/ return 0x1000 * 6; # endif /* LONGLIMIT */ #else /* HECK, I DON'T KNOW*/ return 1024; #endif /* __ASERIES */ } #endif /* NOGETPAGESIZE */ 8-28 The C preprocessor Macros may reference other macros which have not yet been defined System.h 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 #define SYSV 100 #define BSD 200 #define MSDOS 300 #ifdef SYSTEM # if SYSTEM == SYSV # define SYSTEM_H "sysv.h" # elif SYSTEM == BSD # define SYSTEM_H "bsd.h" # elif SYSTEM == MSDOS # define SYSTEM_H "msdos.h" # else # error unknown SYSTEM specified # endif #else # error no SYSTEM specified #endif /* SYSTEM */ #include SYSTEM_H Foo.c 1 2 3 4 5 6 7 8 9 10 #ifndef SYSTEM # if defined (USG) # define SYSTEM SYSV # elif defined (unix) # define SYSTEM BSD # elif defined (dos) # define SYSTEM MSDOS # endif /* USG */ #endif /* SYSTEM */ #include "system.h" % cc foo.c % cc -DSYSTEM=SYSV % cc -DSYSTEM=ASERIES 8-29 The C preprocessor Other preprocessing directives Error generation (ANSI) The following preprocessor directive is available under ANSI C #error (token-sequence)? This causes the compiler to write a diagnostic message that includes the token sequence 5 6 7 8 9 10 11 12 13 14 15 16 17 example -- from the previous page #ifdef SYSTEM # if SYSTEM == SYSV # define SYSTEM_H "sysv.h" # elif SYSTEM == BSD # define SYSTEM_H "bsd.h" # elif SYSTEM == MSDOS # define SYSTEM_H "msdos.h" # else # error unknown SYSTEM specified # endif /* SYSTEM == SYSV */ #else # error no SYSTEM specified #endif /* SYSTEM */ % gcc foo.c In file included from foo.c:10: system.h:16: #error no SYSTEM specified % gcc -DSYSTEM=SYSV % gcc -DSYSTEM=ASERIES In file included from foo.c:10: system.h:13: #error unknown SYSTEM specified 8-30 The C preprocessor Other preprocessing directives (2) -- line control The #line directive may be used to change which file and line number it thinks it is currently processing #line constant "filename" #line constant This causes the compiler to believe, for error diagnostics, that the line number (and filename) of the next line is constant (and filename) respectively 99 program 100 101 102 command_list 103 104 105 106 command 107 108 109 110 parse.y : command_list ; : command_list command | command ; : HALT { /* forget semicolon */ printf ("Adios!\n") exit (0); } Yacc (Bison) then generates a corresponding C program file... parse.c 654 case 4: 655 #line 107 "parse.y" 656 { /*forget semicolon*/ 657 printf("Adios!\n") 658 exit(0); 659 ; 660 break;} Thus, when we compile parse.c, the error diagnostics will point us to the appropriate line in parse.y 8-31 The C preprocessor Other preprocessing directives (3) -- the #pragma directive (ANSI) The following preprocessor directive is available under ANSI C #pragma (token-sequence)? This performs implementation specific tasks Each compiler is free to support special names that have implementationdefined behavior when preceded by a #pragma For instance, a compiler might support the names NO_SIDE_EFFECTS and END_NO_SIDE_EFFECTS, which inform the compiler whether it need to worry about the side effects for a certain block of statements Consider the following code fragment #pragma NO_SIDE_EFFECTS a = fn (x, 2); *p = 2; #pragma END_NO_SIDE_EFFECTS In this example, the pragma is used to help the compiler generate more efficient code by telling it that it can do *p = 2 before the call to fn because the call to fn will not produce any side effects, and likewise the change in *p will not effect fn Note that pragmas are compiler dependent and cannot be expected to be portable An unknown pragma should at worst generate a warning 8-32 The C preprocessor Other preprocessing directives (4) The NULL directive A line with simply a # on it has no effect, e.g. # Predefined macro names (ANSI) The ANSI standard defines five macro names that are build into the preprocessor Each names begins and ends with two underscore characters You may not redefine or #undef these macros Older compilers may support some but probably not all of these macros macro __LINE__ __FILE__ __TIME__ __DATE__ __STDC__ expanded value Expands to the source file line number on which it is invoked (int) -- available on most older compilers Expands to the name of the file in which it is invoked (char []) -- available on most older compilers Expands to the time of program compilation (char []) -- ANSI Expands to the date of program compilation (char []) -- ANSI Expands to the constant 1 if the compiler conforms to the ANSI standard 8-33 The C preprocessor Using predefined macros The __LINE__ and __FILE__ macros are valuable for diagnosing programs We can implement a macro that compares two expressions and if they are not equal, will print out a diagnostic We first implement the function which prints out the diagnostic void fail (int a, int b, char p[], int line) { printf ("Check failed in file %s at line %d: ", p, line); printf ("received %d, expected %d\n", a, b); } Then, in a common header, we could define #define CHECK(a, b) \ if ((a) != (b)) \ fail (a, b, __FILE__, __LINE__) Then, anywhere within the program we could check to see if, for instance, a variable x equals 0 by including the following diagnostic: CHECK (x, 0); 8-34 The C preprocessor Using predefined macros (2) Similarly, we can use the __DATE__ and __TIME__ macros for recording the time and date that the file was last compiled The following procedure will print out the date and time when that file was last compiled void print_version (void) { printf ("This file last compiled on "); printf ("%s at %s\n", __DATE__, __TIME__); } The __STDC__ macro, if it expands to 1, signifies that the compiler conforms to the ANSI standard If it expands to any other value, or if it is not defined, one should assume that the compiler does not conform to the ANSI standard In general, the existence of __STDC__ implies that the compiler can handle function prototypes However, certain ANSI headers, such as <stdlib.h> and <stdarg.h>, may not exist Those headers should exist if __STDC__ expands to 1 #ifdef __STDC__ /* compiler understands prototypes */ # if __STDC__ == 1 /* system is ANSI compliant */ # else /* system is almost ANSI compliant */ # endif /* __STDC__ == 1 */ #else /* traditional C compiler */ #endif /* __STDC__ */ 8-35 The C preprocessor “Stringification” -- ANSI feature One of the limitations of the preprocessor described in the first edition of K&R is that there is no way to treat a series of characters as both a string and an expression However, with an ANSI conforming compiler, one can produce this behavior by using the preprocessor operator # This forces the preprocessor to surround the next replacement argument with double quotes The preprocessor operator # may be applied only to formal parameters of macros #define message_for(a, b) \ printf (#a " and " #b \ ": please report to the front desk\n") main() { message_for (Wilma, Fred Flintstone); } The output of the program is Wilma and Fred Flintstone: please report to the front desk The macro message_for (Wilma, Fred Flintstone) expands to printf("Wilma" " and " "Fred Flintstone" \ ": please report to the front desk\n"); Then, adjacent string literals are concatenated into one big string literal 8-36 The C preprocessor The assert macro from <assert.h> A useful macro available on almost all C installations, old and new, is the assert macro This macro is defined in the header <assert.h> It allows a user to test to see if an expression is true if it is, nothing happens if it is false, then an error diagnostic is printed and the program is terminated conceptually it says to assert that this condition is true foo.c 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #include <stdio.h> #include <assert.h> double compute_mean (double a[], int n) { double total; int i; assert ("Bad size" && n > 0); for (i = 0, total = 0; i < n; i++) total += a[i]; return total / n; } If compute_mean() were called with n either 0 or negative, the following run-time error would occur Traditional C Failed assertion at line 9 of `foo.c' ANSI C Failed assertion `"Bad size" && n > 0' at line 9 of `foo.c' 8-37 The C preprocessor The assert macro from <assert.h> (2) The assert macro, by default, is enabled Users may turn it off by defining the macro NDEBUG This may either be done explicitly before including the <assert.h> header #define NDEBUG It may also be done by defining it on the cc command line, e.g.: % cc -DNDEBUG -c foo.c In fact, any macro may be defined on the cc command line using the -D command line option: -D<macro> -D<macro>=<replacement-text> When we define a macro on the compiler command line, it gets defined before any source code is read If NDEBUG is defined, then the assert macros will do nothing This is useful if one is writing production code The assert macros will be enabled during the test and debug phase, but can then be disabled when the code goes into production 8-38 The C preprocessor The assert macro from <assert.h> (3) Implementation in traditional C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 assert.h -- traditional C implementation #ifdef NDEBUG #define assert(expr) #else #define __assert {\ fprintf(stderr, "Failed assertion at ");\ fprintf(stderr, "line %d of `%s'\n",\ __LINE__, __FILE__);\ abort();\ } #define assert(expr)\ if ((expr) == 0) __assert #endif /*NDEBUG*/ 8-39 The C preprocessor The assert macro from <assert.h> (4) Implementation in ANSI C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 assert.h -- ANSI C implementation #ifdef NDEBUG #define assert(expr) #else # ifndef __STDC__ # define __assert {\ fprintf(stderr, "Failed assertion at ");\ fprintf(stderr, "line %d of `%s'\n",\ __LINE__, __FILE__);\ abort();\ } # define assert(expr)\ if ((expr) == 0) __assert # else # define __assert(expression_string) {\ fprintf(stderr, "Failed assertion `%s'", expression_string);\ fprintf(stderr, " at line %d of `%s'\n",\ __LINE__, __FILE__);\ abort();\ } # define assert(expr)\ if ((expr) == 0) __assert(#expr) # endif #endif /*NDEBUG*/ 8-40 The C preprocessor Token pasting -- the ## operator (ANSI) The ANSI standard defines a new preprocessor operator ## that pastes two lexical tokens Consider the following example #define #define #define #define mask_0 mask_1 mask_2 mask_3 #define #define #define #define shift_0 shift_1 shift_2 shift_3 0x1 0x2 0x4 0x8 0 1 2 3 #define BIT_nn(x,n) ((x) & mask_ ## n) #define SET_nn(x,n) ((x) | 0x1 << shift_ ## n) #define RESET_nn(x,n) ((x) & ~ mask_ ## nn) void foo (unsigned int x) { x = SET_nn (x, 1); x = RESET_nn (x, 2); if (BIT_nn (x, 3)) printf ("foo\n"); } The function foo() expands as follows: void foo (unsigned int x) { x = ((x) | 0x1 << shift_1); x = ((x) & ~ mask_2); if (((x) & mask_3)) printf("foo\n"); } The preprocessor then expands this by expanding the resultant macros: void foo(unsigned int x) { x = ((x) | 0x1 << 1); x = ((x) & ~ 0x4); if (((x) & 0x8)) printf("foo\n"); } 8-41 The C preprocessor Token pasting -- the ## operator (2) Consider the following macro: #define FILENAME(extension) test_ ## extension The code fragment void foo (void) { FILE *FILENAME(1), *FILENAME(2); FILENAME(1) = fopen ("test.1", "r"); FILENAME(2) = fopen( "test.2", "w"); } expands to void foo (void) { FILE *test_1, *test_2; test_1 = fopen ("test.1", "r"); test_2 = fopen ("test.2", "w"); } Notice that the paste operator produces new lexical tokens 8-42 The C preprocessor Token pasting -- the ## operator (ANSI) -- (2) A more interesting example is one which includes a particular version of some header file #ifndef VERSION #define VERSION 3 #endif /* VERSION */ #define str(s) #define FILENAME(n) #s db ## n #include str (FILENAME(VERSION).h) The #include statement expands to #include str (FILENAME(3).h) which in turn expands to #include str (db3.h) which finally expands to #include "db3.h" We could then determine which file we include on the cc command line: % cc -DVERSION=4 foo.c This would force "db4.h" to be included instead of "db3.h" 8-43 The C preprocessor Conditional function prototypes If one writes code that must be compiled under both an ANSI and traditional C compiler, then one must use the traditional C approach for function definitions However, it is possible to declare them using either a prototype or not depending on the type of compiler Consider the following macro: #ifndef P # if __STDC__ # define P(s) s # else # define P(s) () # endif /* __STDC__ */ #endif /* P */ One can the declare a function: s int foo P((int c, double d, char *s)); and implement it int foo (c, d, s) char c; double d; char *s; { ... } The function declaration expands to either int foo (); or int foo (int c, double d, char *s); depending on whether __STDC__ is defined to be nonzero Note that the parameter s binds to (int c,double d,char *s) if __STDC__ is defined to be nonzero, and '()' otherwise 8-44 The C preprocessor Final remarks Most C compilers define additional preprocessing tokens to help developers tailor their applications to the host environment For instance, UNIX systems define the macro unix Apollo systems define the macro apollo Sun systems typically define the macro sun Thus, one often isolates machine dependent code: #ifdef /* #endif #ifdef /* #endif sun do sun stuff */ /* sun */ apollo do apollo stuff */ /* apollo */ The Unisys A-Series recently defines __ASERIES__ Such options are usually defined in the documentation example The GNU C compiler gcc defines the following macros on an Apollo DN3000: apollo aegis unix mc68020 __apollo__ __aegis__ __unix__ 8-45