Input Sanitization COEN 225 All Input is Evil All input is evil: At least potentially Input can be: (A random collection) Files Web forms Cookies Registry entries Database contents Command line arguments Environmental variables HTTP requests Named pipes E-mail … Finding Common Entry Points Files Contain data specified by users Contain data supplied by application Can be intentionally or unintentionally corrupted Attacker can also attack file metadata: Extension Path File system attributes … Finding Common Entry Points Sockets Easy to connect to sockets need to filter data Attacker can Monitor data Send malformed data to client or to server Intercept data in the middle of a request and replace it A.k.a Man in the middle attack Finding Common Entry Points HTTP requests Almost always passes through firewalls Using webproxy, users have complete control over what is send to the server Named pipes See sockets But programmers might forget how named pipes work and trust input E.g. SQL Server 2000 vulnerability See http://www.blakewatts.com/namedpipepaper.html Finding Common Entry Points Pluggable Protocol Handler Example: http, ftp, https in URL mailto:tschwarz@scu.edu?subject=WrongPerson Tell system which application handles data when a hyperlink is clicked Maliciously crafted link irc://[~900 characters] caused buffer overflow in mIRC protocol handler that allowed arbitrary code execution Finding Common Entry Points Programmatic Interfaces RPC COM DCOM ActiveX Managed code entry points (Windows) .NET Remoting Finding Common Entry Points SQL Improperly filtered input strings can lead to execution of powerful SQL commands Registry User Interfaces Win95 machines were used in libraries Attacker could remove the “Start” button for free entertainment Finding Common Entry Points Command line arguments Attacker provides helpful link with arguments embedded Example: Cross scripting attacks Environmental Variables Can be used by programs to make decisions Canonicalization Authentication decision made by one module Access done by other module Input Validation Input – Anything controlled by outsider user command line input configuration files that could be manipulated http requests packets under consideration by firewall … Input Validation Security Strategies Black List List all things that are NOT allowed List is difficult to create Adding insecure constructs on a continuous basis means that the previous version was unsafe Testing is based on known attacks. List from others might not be trustworthy. White List List of things that are allowed List might be incomplete and disallow good content Adding exceptions on a continuous basis does not imply security holes in previous versions. Testing can be based on known attacks. List from others can be trusted if source can be trusted. Input Validation Principle problem Location of Check Location of Use Principle solution Canonicalization of input Transform input into a canonical form Decision is made on input in the same form that program uses Canonicalization Two major program errors: Misunderstanding definition of canonical form Stopping canonicalization process to early Canonicalization: Dealing with Metacharacters Meta-information can be attached Out-Of-Band In-Band Often more readable Often more compact Has security implications Potential for overlapping trust domains: There exists a logical boundary between data and metadata Parser need to identify the difference between data and metadata correctly Canonicalization: Dealing with Metacharacters Example: NULL character for termination of strings Canonicalization: Dealing with Metacharacters Simplest Vulnerability: Users can embed metacharacters into input that is not filtered Instance of second-order injection attack The attack happens when the metacharacter is evaluated Example: Password update (next slide) Canonicalization: Dealing with Metacharacters No input use CGI; sanitization! … verify session details … $new_password = $query->param(′password′); open(IFH,″</opt/passwords.txt″) || die (″$!″); User bob inputs: open(OFH,″>/opt/passwords.txt.tmp″) || die (″$!″); test\njim:npwd while(IFH) { ($user, $pass) = split /:/; if ($user ne $session_username) OFH becomes: print OFH ″$user:$pass\n″; bob:test else jim:npwd print OFH ″$user:$new_password\n″; } … Bob just added a close( IFH ); new user close( OFH ); Canonicalization: Dealing with Metacharacters Discovering attacks like this: 1. 2. 3. 4. 5. Identify code that deals with metacharacter strings Identify all delimiter characters that are specially handled and put them into a list Identify filtering performed on input Eliminate potentially hazardous delimiter characters from list Remaining characters on list indicate a vulnerability Canonicalization: Dealing with Metacharacters Bool HandleUploadedFile(char * filename) { unsigned char buf[MAX_PATH], pathname[MAX_PATH]; char * fname = filename, *tmp1, *tmp2; DWORD rc; HANDLE hFile; tmp1 = strrchr(filename,′/′); tmp2 = strrchr(filename,′\\′); if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1; if(!fname) return FALSE; if(strstr(fname, ″.. ″)) return FALSE; _snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname); rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname)); if(rc == 0 || rc > sizeof(pathname)) return FALSE; hFile = CreateFile(pathname, …); … read bytes into the file … } 1 Input string is formatted a number of ways before it becomes a file name. Added to a statically sized buffer and prefixed with \\\\?\\%TEMP%\\ Canonicalization: Dealing with Metacharacters Bool HandleUploadedFile(char * filename) { unsigned char buf[MAX_PATH], pathname[MAX_PATH]; char * fname = filename, *tmp1, *tmp2; DWORD rc; HANDLE hFile; tmp1 = strrchr(filename,′/′); tmp2 = strrchr(filename,′\\′); if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1; if(!fname) return FALSE; if(strstr(fname, ″.. ″)) return FALSE; _snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname); rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname)); if(rc == 0 || rc > sizeof(pathname)) return FALSE; hFile = CreateFile(pathname, …); … read bytes into the file … } 2 Set of delimiter characters that are specially handled: ‘/’ ‘\’ “..” String is passed to Expand EnvironmentStrings( ). Environmental variables are denoted with % characters. Canonicalization: Dealing with Metacharacters Bool HandleUploadedFile(char * filename) { unsigned char buf[MAX_PATH], pathname[MAX_PATH]; char * fname = filename, *tmp1, *tmp2; DWORD rc; HANDLE hFile; tmp1 = strrchr(filename,′/′); tmp2 = strrchr(filename,′\\′); if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1; if(!fname) return FALSE; if(strstr(fname, ″.. ″)) return FALSE; _snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname); rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname)); if(rc == 0 || rc > sizeof(pathname)) return FALSE; hFile = CreateFile(pathname, …); … read bytes into the file … } 3 Set of delimiter characters that are specially handled: ‘/’ ‘\’ “..” String is passed to Expand EnvironmentStrings( ). Environmental variables are denoted with % characters. Canonicalization: Dealing with Metacharacters Bool HandleUploadedFile(char * filename) { unsigned char buf[MAX_PATH], pathname[MAX_PATH]; char * fname = filename, *tmp1, *tmp2; DWORD rc; HANDLE hFile; tmp1 = strrchr(filename,′/′); tmp2 = strrchr(filename,′\\′); if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1; if(!fname) return FALSE; if(strstr(fname, ″.. ″)) return FALSE; _snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname); rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname)); if(rc == 0 || rc > sizeof(pathname)) return FALSE; hFile = CreateFile(pathname, …); … read bytes into the file … } 4 Filtering: strrchr searches last occurrence for ‘/’ and ‘\’ and increments past it. strstr searches for “..” Canonicalization: Dealing with Metacharacters Bool HandleUploadedFile(char * filename) { unsigned char buf[MAX_PATH], pathname[MAX_PATH]; char * fname = filename, *tmp1, *tmp2; DWORD rc; HANDLE hFile; tmp1 = strrchr(filename,′/′); tmp2 = strrchr(filename,′\\′); if(tmp1||tmp2) fname = (tmp1 > tmp2? tmp1 : tmp2)+1; if(!fname) return FALSE; if(strstr(fname, ″.. ″)) return FALSE; _snprintf(buf, sizeof(buf), ″\\\\?\\%TEMP%\\%s″,fname); rc = ExpandEnvironmentStrings(buf, pathname, sizeof(pathname)); if(rc == 0 || rc > sizeof(pathname)) return FALSE; hFile = CreateFile(pathname, …); … read bytes into the file … } 5 However, ‘%’ remains Client can supply a number of environmental variables such as QUERY_STRING In addition, something like ..\..\..\any\pathname\file.txt supplied in QUERY_STRING allows client to write to arbitrary locations in the file system Canonicalization: Dealing with Metacharacters open(FH, ″>$username.txt″) NULL character injection || die(″$!″); print are FH necessary $data; NULL characters to terminate strings when calling C routines from OS and many APIs close(FH); Perl and other languages do not use NULL for termination Example: Perl application programmer tests that file name ends in “.txt” Attack inputs sequence “%00” in CGI input Decoded as NUL character Can be used to cut-off filename, including extension Canonicalization: Dealing with Metacharacters: NULL NUL metacharacter is used to end Cstrings, but not Perl, Java, PHP, … This is a canonicalization issue: C-based modules canonicalize strings differently than the no-C/no-Unix world Issues arise when strings cross boundaries between these worlds Canonicalization: Dealing with Metacharacters: NULL Possible results: Memory corruption because strlen returns a different value Truncation of strings False decisions Especially for FILE NAMES B O B . T X T \0 B O B \0 . T X T \0 Path Metacharacters Windows File Names: C:\\WINDOWS\system32\calc.exe Optional device Followed by path NOT UNIQUE C:\\WINDOWS\system32\drivers\..\calc.exe calc.exe .\calc.exe ..\calc.exe \\?\WINDOWS\systems32\calc.exe File system uses file canonicalization But the system is less than canonical Path Metacharacters Issues: File squatting (in Windows) Need to use CreateFile carefully in order to CreateFile canonicalization Not open an existing file that sits in the canonical path of the file name eliminates any directory traversal components before validating whether each path segment exists C:\nonexistent\path\..\..\blah.txt accesses C:\blah.txt File-like Objects CreateFile can open objects that are treated like files but are not files: \\host\object type\name Device Files Reside in the file hierarchy But are canonicalized differently COM!-9, LPT1-9, CON, CONIN$, CONOUT$, PRN, AUX, CLOCK$, NUL Programmers are often not aware of the rules Path Metacharacters CreateFile() (Windows) idiosyncrasies Strips out trailing spaces in file names Example attack Case Sensitivity Windows filenames are not case sensitive, UNIX and HFS filenames are DOS 8.3 Programmer attaches “.txt” to a user-provided name Attacker provides “helloworld.exe “ with trailing space The trailing space with following .txt is stripped out Short file name is created by the file system if the file name is too long. File can be referred to by the short file name Use \\?\ before file name to disable DOS filename parsing Insure that files are normal files by checking for FILE_ATTRIBUTE_NORMAL, or face access to named pipes, … Alternative Data Streams are created with an “:” separator Path Metacharacters Registry keys Naming similar to files Similar issues Worthy of its own presentation Canonicalization: Dealing with Metacharacters Shell Metacharacter Injection Attack vector User controls input to an argument for execve(), popen(), … Dangerous shell characters ; | & < > ` ! - * / ? ( ) . [space] [ ] “\t” ^ ~ \ “\\” quotes “\r” “\n” $ Canonicalization: Dealing with Metacharacters SQL Injection attack Attack vector: User controls part of the SQL query string Canonicalization Meta Character Filtering Three basic options Detect erroneous input and reject what appears to be an attack 2. Detect and strip dangerous characters 3. Detect and encode dangerous characters with a metacharacter escape sequence 1. Canonicalization Meta Character Filtering Eliminating Metacharacters Whitelisting: Allow only good strings if($input_data =~ /[^A-Za-z0-9_ ]/) { exit; } Whitelisting: Strip away anything that is not good $input_data =~ s/ /[^A-Za-z0-9]/g Stripping is vulnerable to mistakes Blacklisting: Make decisions based on dangerous characters (not recommended) Canonicalization Meta Character Filtering Escaping Metacharacters Non-destructive: metacharacters are preserved in string Goal: Receiving module receives a safe string Attack vectors: Metacharacter evasion Encoded metacharacter can be used to avoid other filtering Canonicalization Meta Character Filtering Escaping Metacharacters Filtering does not detect encoded dXNlcj1wYXNzd2QmaG9tZWRpcj0uLiUyNSUzMiU0Ni4uJTL1JTMyJTQ metacharacters Base 64 Decoder Example: ..%2F..%2Fetc%2Fpasswd user=passwd&homedir=..%25%32%46..%25%32%46etc Double Encoding Attacks Hexadecimal Decoder pass 1 user=passwd&homedir=..%2F..%2Fetc Hexadecimal Decoder pass 1 user=passwd&homedir=../../etc Canonicalization Meta Character Filtering Character Sets Example vulnerabilities Wide characters (unicode) C-style strings are terminated with a 16 NULL, normal character strings with an 8 NULL Homographic attacks Different characters look the same String “Microsoft” “Microsoft” in Unicode one ‘o’ is cyrillic length calculations need to take character set into account (wide characters vs. normal characters)