User Input Validation with Regular Expressions

advertisement
Input Validation with
Regular Expressions
COEN 351
Input Validation

Security Strategies
 Black List
 List all things that are NOT allowed


List is difficult to create
 Adding insecure constructs on a continuous basis means
that the previous version was unsafe
 Testing is based on known attacks.
List from others might not be trustworthy.
 White List
 List of things that are allowed


List might be incomplete and disallow good content
 Adding exceptions on a continuous basis does not imply
security holes in previous versions.
 Testing can be based on known attacks.
List from others can be trusted if source can be trusted.
Perl Regular Expressions

Regular Expression = Pattern
 Template
that either matches or does not
match a string
Excursus: Getting Input in Perl
Use <STDIN> to read from standard input
 Use ‘defined’ construct to tell if read was
successful

while(defined($line=<STDIN>))
print “I saw $line”;
}
{
Excursus: Getting Input in Perl
Non-sensical shortcut
 Uses standard loop variable $_

while(<STDIN>) {
print "I saw $_";
}
foreach(<STDIN>)
{
print "I saw $_";
}
Gets line, executes body of
loop.
Gets all the lines, then
executes body of loop.
$_ is the default loop variable.
Excursus: Getting Input in Perl
The STDIN is a default
 chomp acts on default variable $_

while(<>) {
chomp;
print "I saw $_\n";
}
Perl Regular Expressions
Matching and substitution are fundamental
tasks in Perl
 Implemented using one letter operators:

 m/PATTERN/
 m//

pattern matching
 s/PATTERN/REPLACEMENT/
 s///

Substitution
Perl Regular Expressions

Meta-characters in a pattern need escaping with
backslash










\
|
( )
[ ]
{ }
^
$
*
+
?
Perl Regular Expressions

Interpolation
 Perl
substitutes strings in strings:
$foo = “bar”;
/$foo$/;
 Equivalent
/bar$/;
to:
Perl Regular Expression:
Binding Operator
Pattern matching is so frequent in Perl that
there is a special operator
 Normally, pattern matching is done on
default operand $_
 =~ binds a string expression to a pattern
match (substitution, transliteration)

Perl Regular Expression:
Binding Operator
=~ has left operand a string
 =~ has right operand a pattern

 Could
be interpreted at run time.
Returns true / false depending on the
success of match.
 !~ operation is the same, but result is
negated.

Perl Regular Expression:
Binding Operator
$_ =~ $pat;
is equivalent to
$_ =~ /$pat/;
but is less efficient since giving the pattern
directly since the regular expression will be
recompiled at run time
Perl Regular Expression:
Binding Operator Example
if ( ($k,$v) = $string =~ m/(\w+)=(\w*)/) {
print “Key $k Value $v\n”;
}
Since =~ has precedence over =, it is evaluated first.
The binding operator binds variable $string to a pattern looking for
expressions like “ key=word.
The binding expression is done in a list context, hence, the resulting
matches are returned as a list.
The list is then assigned to ($k,$v).
The result of the assignment is the number of things assigned, i.e.
typically 2.
Since 2 is not 0, this is equivalent to true and hence the if-block is
entered.
Perl Regular Expressions

Qualifiers:
*
matches the preceding character zero or more times.

Pattern “abc*d” is matched by


 Use
rabd
zabccccd
parentheses to group letters
#/perl/bin/perl
#/perl/bin/perl
while(<>)
{
chomp;
last if $_ eq 'stop';
if (/abc*d
/) {
print "Matched: |$`<$&>$'|\n";
} else {
print "No match.\n";
}
}
while(<>)
{
chomp;
last if $_ eq 'stop';
if (/a(bc)*d
/) {
print "Matched: |$`<$&>$'|\n";
} else {
print "No match.\n";
}
}
Perl Regular Expressions

Qualifiers:
 ‘*’
matches zero or more instances
 ‘+’ matches one or more instances

 ‘?’
“ab(cde)+fg”
matches none or one
Perl Regular Expressions

Alternatives
 ‘|’

“or”
Either the right or the left side matches
Perl Regular Expressions

Character Classes
 List
of possible characters inside a square
bracket
 Example:
[a-cw-z]+
 [a-zA-Z0-9]

 Negation

provided by caret
[^n\-z] matches any character but ‘n’, ‘-’, ‘z’
Perl Regular Expressions

Character classes shortcuts
 \w
(word) is a shortcut for [A-Za-z0-9]
 \s (space) is a shortcut for [\f\t\n\r ]
 \d (digit) is a shortcut for [0-9]
 [^\d] anything but a digit
 [^\s] anything but a space character
 [^\w] anything but a word character
Perl Regular Expressions

Perl regex semantics are based on:
 Greed

Perl tries to match as much of an expression as is possible
 Eagerness


Perl gives the first possible match
The left-most match wins
 Backtracking


The entire expression needs to match
Perl regex evaluation backtracks if match is impossible
Perl Regular Expressions

Eagerness Example:
 What
is the result of this snippet
$string = “boo hoo“;
$string =~ s/o*/e/;
boo hoo
be hoo
bee hoo
boo heo
boo hee
eboo hoo
#left side of =~ needs to be an l-value
Perl Regular Expressions


Quantifiers *, +, ? are not always enough
Specify number of occurrences by placing
comma separated range in curly brackets
 /a{2,12}/

2 to 12 ‘a’
 /a{5,}/

5 or more ‘a’
 /a{5}/

exactly 5 ‘a’
Perl Regular Expressions

Anchors
 pattern
can match everywhere in the string unless
you use anchors
 ^ beginning of string
 $ end of string
 /b start or end of a group of w-characters
 /B non-word boundary anchor

Examples:
 /^hello/ matches only at beginning of string
 /world$/ matches only at the end of string
Perl Regular Expressions

Parentheses and Memory



( ) group together part of a pattern
Also remember corresponding match part of string.
These are put into a backreference



Made by backslash followed by number
Available as $1, … after matching
Examples



/(.)\1/ matches any character followed by itself
/../ matches any two characters
/([‘”]).*\1/ matches any string starting with single or double quotes followed by
zero or more arbitrary characters followed by the same type of quotes.



“doesn’t match’
“does match”
‘does match’
Perl Regular Expressions
Validating e-mail

Out of channel verification:




Lookup of DNS records for MX records


Ask for email addresses twice to weed out typos.
Send email to address given.
Still need to prevent command-line insertion
Assumes site connectivity
Regular expressions

Typically have subtle errors


tom&jerry@warnerbros.com is valid, but fails simple regex
president@whitehouse.gov is valid, deliverable, but probably fake
Perl Regular Expressions
Validating email

if ( $email =~ /\@/ ) { … }


if ( $email =~ /\S+\@\S+/ )




checks for non-white space characters divided by an ampersand
matches thomas@hotmail
if ( $email =~ /\S+\@\S+\.\S+ )
if ( $email =~ /[\w\-]+\@[\w\-]+\.[\w\-]+/


checks for an ampersand
matches most valid emails, but allows multiple emails
if ( $email =~ /^[\w\-]+\@[\w\-]+\.[\w\-]+$/

anchored at beginning and end of word
Perl Regular Expressions

Checking for strings that only contain alphabetic
characters.

ASCII based regex is insufficient:



if($var =~ /^[a-zA-Z]+$/)
Does not work for characters with diacritic marks
Best solution is to use Unicode properties


if($var =~ /^[^\W\d_]+$/)
Explanation:




\w matches alphabetic, numeric, underscore (alphanumunder)
\W is a non-alphanumunder
[^\W\d_] is a character that is neither non-alphanumunder, digit, or
underscore, hence an alphabetic character
Could also use POSIX character classes, but those depend on
locale
Perl Regular Expressions

Making regex readable
 Place
semantic units into a variable with an
appropriate name
$optional_sign = ‘[-+]?‘;
$mandatory_digits = ‘\d+’;
$decimal_point = ‘\.?’;
$optinonal_digits = ‘\d*’;
$number = $optional_sign
.$mandatory_digits
.$decimal_point
.$optional_digits;
if ( /($number)/) { … }
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Perl Regular Expressions
Download