
Parsing and Validating Text Input
file opening and closing
fprintf, fscanf and sscanf
fgets and fputs
fgetc and putc
Parsing a Token Delimited Input Record
Example Program using strtok
Input Validation Approaches
Checking for safe or dangerous input
Except for stdin, stdout and stderr, files have to be
opened before reading or writing. fopen() opens a
file and returns a filehandle or NULL on error.
fclose() closes a filehandle.
Example 1:
#include <stdio.h>
#define IN "master.txt"
#define OUT "backup.txt"
Example 1 continued
fprintf and fscanf
fscanf() and fprintf() work in the same way as printf
and scanf, the difference being the extra first
filehandle parameter is used to direct input/output
to/from an opened file.
fscanf() has no inbuilt protection against buffer
overflows, or data in the input file being
incompatible with the data wanted. It stops reading
a field on encountering whitespace. What if the
field contains spaces, tabs or newlines ?
With sscanf() the first parameter is the string it reads and
converts. This can be useful if the string has been validated e.g.
as a number, and string fields don't contain embedded
fscanf(stdin, args ... ) is the same as scanf( args ... ) .
fprintf(stdout, args ... ) is the same as printf( args ... ) .
sscanf(string, args ... ) like fscanf but scans and parses string
instead of file.
fgets and fputs
fgets() reads a line of data from a file up to and including the
newline character: '\n' into a string and then appends the string
terminator character: '\0' after the newline. fgets() returns the
value NULL (not EOF !) when attempting to read beyond the
end of file. fgets() requires:
* the name of the string ( or any other pointer giving the
address at which it starts),
* the maximum number of characters to read - 1 (to leave
room for the '\0' end of string marker)
* and the filehandle as parameters.
fputs() writes a string to a file such that fputs(string,out) is the
equivalent of fprintf(out,"%s",string).
fgets continued
The fgets() function is particularly useful for
robustly reading text files organised into records
separated using newlines as it contains built in
buffer overflow protection. Data can be input using
fgets() into a character string, validated to ensure
the correct number and types of data items are
present and then read from the string into local
program variables of the appropriate types using
fgetc and putc
These functions are the file-enabled equivalents of
getchar() and putchar(). They are used to read and
write single characters from and to files
respectively. getc() returns EOF if an attempt is
made to read beyond the end of file. c=getc(in); is
the equivalent of fscanf(in,"%c",&c); and
putc(c,out); is the equivalent of fprintf(out,"%c",c);
Copying file one character at a time
Parsing a Token Delimited Record
Use of the strtok() function in stdlib.h helps make this job a
bit easier. The idea is to convert field delimiters into '\0'
null characters. Strtok is passed and returns the address of
the start of field 1. For fields >= 2 you can either pass it a
NULL instead, when it will automatically calculate the
address of the start of the next string, unless you choose to
calculate and pass the address of the start of the next string
and so on. These string addresses can be stored in an array
of char pointers for use later. This technique can be used to
input fields which include unknown numbers of space and
tab (\t) characters.
Warning concerning strtok
strtok() modifies the string it parses, by replacing field
delimiters with '\0' NULL byte characters. If this is a
problem, clone the string first using strcpy() and then parse
the clone. E.G.
char *clone;
/* using malloc() to avoid buffer overrun */
if( (clone = (char*)
== NULL )
exit(1); /* error if insufficient memory
/* must remember to free(clone) later */
strtok example program
strtok program output
Name: Joseph Smith
Weight: 64.300000
Age: 25
A thread-safe strtok
The static pointer variable value used internally within strtok()
won't survive concurrent use in a multi-threaded application. If
this is a problem, you can use the re-entrant version strtok_r(),
prototype defined in the POSIX.1-2001 standard as follows:
char *strtok_r(char *str, const char *delim, char **saveptr);
The saveptr has to be passed the address of a pointer variable
declared within the caller function, which enables the position
within the string being parsed to be remembered between
function calls.
Input Validation Approaches
Is input likely to be perfect, clumsy or hostile ?
Perfect input assumes the person entering data will never use an
incorrect key on the keyboard. The program is otherwise allowed to
Clumsy input is common for a stand-alone application. An application
is fragile and less usable if it crashes e.g. due to casual use of the
<enter> key by a user who hasn't read the prompt requesting input
data correctly.
Hostile input has to be assumed very likely if the application accepts
input data from non-authenticated users over the Internet. A
standalone application might later become a web-browser plugin.
Buffer Overflow Protection
A buffer overflow occurs when a program writes beyond or
outside allocated blocks of memory. Attackers may attempt to
write specific data into the executable part of a program, e.g.
vectoring execution into inserted code by overwriting a
function return address (stack smashing). The allocated block
might be an structure or character array, or a block allocated
dynamically using malloc().
Many network programs are compromised through buffer
overflows. fgets() allows the programmer to specify the
maximum buffer size which it will overwrite. Careful
programming is needed to ensure access can only be made
within allocated memory.
Hostile input example
A web-based calculator program reads data from an HTML form expected
to be in the format: a op b, where a and b are numbers and b is an
arithmetic operation e.g. +, -, * and / . A naive programmer has used a Perl
or Python eval() function upon this input data and writes the result of the
calculation to the web browser.
Mr Evil Cracker tests for this possibility with the form input:
This results in the output:
Traceback (most recent call last):
File "", line 1, in ?
File "", line 0, in ?
IOError: [Errno 13] Permission denied: '/etc/shadow'
Hostile input example 2
This shows that there is some rudimentary security on this system,
as the webserver program is not running with the administrator
privileges which would allow reading the shadow password file.
Mr Evil Cracker hasn't got a crackable form of the password hash
file yet, but he now knows that he can run any Python code on the
target system with the permissions of the webserver program. As
this allows him to create and execute other program files on this
server, all he now needs is to find a local privilege escalation
exploit, rather than a remote one. His chances of running a program
giving him full control of this server are now much greater.
Check safe or dangerous input ?
The problem with checking for dangerous input is that crackers will
know things about your system that you don't. Therefore you don't
really know what might be dangerous and what isn't so you can't
easily check for specifically dangerous data. However, safe input is
within the range of input values which you have designed your
program to handle.
If the required data is in the form of a string, what are the maximum
and minimum string lengths, and what characters should be allowed
in a string e.g. to input someones name or address ? You should
reject anything not in your allowed designed and tested range of
values, sending a suitable error message to the user so that input
mistakes can be corrected.
Checking safe input numbers
If you want to input numbers, you need to ensure that the data
string input can be safely converted to a number. You want to
consider what the range of acceptable numbers suitable for input
to your program should be:
Minimum and maximum values, avoiding numeric overflows.
Whether integer or floating point.
What the maximum acceptable string length is for each input.
Numbers should always be input as strings and then validated
and converted.