Input Validation – “All input is evil” CS2 Background Summary: Any input that comes into a program from an external source – such as a user typing at a keyboard or a network connection – can potentially be the source of security concerns and potentially disastrous bugs. All input should be treated as potentially dangerous Description: All interesting software packages rely upon external input. Although information typed at a computer might be the most familiar, networks and external devices can also send data to a program. Generally, this data will be of a specific type: for example, a user interface that requests a person’s name might be written to expect a series of alphabetic characters. If the correct type and form of data is provided, the program might work fine. However, if programs are not carefully written, attackers can construct inputs that can cause malicious code to be executed. Risk – How can It happen? Any data that can enter your program from an external source can be a potential source of problems. If external data is not checked to verify that it has the right type of information, the right amount of information, and the right structure of information, it can cause problems. Input validation errors can lead to buffer overflows if the data being provided is used as an index into an array. Input that is used as the basis for a database search can be used as the basis for SQL injections, which use carefully constructed inputs to make relational databases reveal data inappropriately or even destroy data. Example of Occurrence: A Norwegian woman mistyped her account number on an internet banking system. Instead of typing her 11-digit account number, she accidentally typed an extra digit, for a total of 12 numbers. The system discarded the extra digit, and transferred $100,000 to the (incorrect) account given by the 11 remaining numbers. A simple dialog box informing her that she had typed two many digits would have gone a long way towards avoiding this expensive error. Olsen, Kai. “The $100,000 Keying error” IEEE Computer, August 2008 Example in Code: This program stores the squares of the number from one to ten in array, and then asks the user to type a number. The square of that number will then be returned: import java.util.Scanner; public class InputValidationExample { public static void main(String[] args) { int[] vals = new int[10]; for (int i = 0; i < 10; i++) { vals[i] = (i+1)*(i+1); } System.out.print("Please type a number: "); Scanner sc = new Scanner(System.in); int which = sc.nextInt(); int square = vals[which-1]; System.out.println("The square of "+which+" is "+square); } } This program has two input validation problems. The first comes with the use of the scanner to read an integer from the console: int which = sc.nextInt(). If the user types a number, this will work just fine. However, if the user types something that is not a number, a NumberFormatException will be thrown. A robust program would catch this error, provide a clear and appropriate error message, and ask the person to re-type their input. The second problem occurs when the array is accessed. Even if the user provides an appropriate integer, the value may be out of the range of the array. A java array containing 10 elements can only be accessed by indices 0,1,...,9. Thus, the only values of which that will work correctly are 1,2,...,10. Any values outside of this range will lead to an attempt to access a value outside the range of the array. In Java, this will lead to an exception. In other languages, this may lead to a buffer overflow that might be exploited by malicious software. How can I avoid input validation problems? Check your input: The basic rule is for input validation is to check that input data matches all of the constraints that it must meet to be used correctly in the given circumstance. In many cases, this can be very difficult: confirming that a set of digits is, in fact, a telephone number may require consideration of the many differing phone number formats used by countries around the world. Some of the checks that you might want to use include: Type: Input data should be of the right type. Names should generally be alphabetic, numbers numeric. Punctuation and other uncommon characters are particularly troubling, as they can often be used to form the basis of code-injection attacks. Many programs will handle input data by assuming that all input is of string form, verifying that the string contains appropriate characters, and then converting the string into the desired data type. Range: Verify that numbers are within a range of possible values: For example, the month of a person's date of birth should lie between 1 and 12. Another common range check involves values that may lead to division by zero errors. Plausibility: Check that values make sense: a person's age shouldn't be less than 0 or more than 150. Presence check: Guarantee presence of important data – the omission of important data can be seen as an input validation error. Length: Input that is either too long or too short will not be legitimate. Phone numbers generally don't have 39 digits; Social Security Numbers have exactly 9 Format: Dates, credit card numbers, and other data types have limitations on the number of digits and any other characters used for separation. For example, dates are usually specified by 2 digits for the month, one or two for the day, and either two or four for the year. Checksums: Identification numbers such as bank accounts, often have check digits: additional digits included at the end of a number to provide a verifiability check. The check digit is determined by a calculation based on the remaining digits – if the check digit does not match the results of the calculation,either the ID is bad or the check digit is bad. In either case, the number should be rejected as invalid. Use appropriate language tools: The safety of tools that read user input varies across programming languages and systems. Some languages, such as C and C++ have library calls that read user input into a character buffer without checking the bounds of that buffer, causing a both a buffer overflow and an input validation problem. Alternative libraries specifically designed with security in mind are often more robust. The choice of programming languages can play a role in the potential severity of input validation vulnerabilities. As strongly-typed languages, Java and C++ require that the type of data stored in a variable is known ahead of time. This requirement leads to the type mismatch problem when – for example- a string such as “abcd” is typed in response to a request for an integer. Untyped languages such as Perl and Ruby do not have any such requirements – any variable can store any type of value. Of course, these languages do not eliminate validation problems – you may still run into trouble if you use a string to retrieve an item from an integer- indexed array. Some languages provide additional help in the form of built-in procedures that can be used to remove potentially damaging characters from input strings. Recover Appropriately: A robust program will respond to invalid input in a manner that is appropriate, correct, and secure. For user input, this will often mean providing an informative error message and requesting re-entry of the data. Invalid input from other sources – such as a network connection – may require alternate measures. Arbitrary decisions such as truncating or otherwise reformatting data to “make it fit” should be avoided. Laboratory/Homework Assignment: Consider this program: import java.util.*; public class Input { public static void main(String[] args) { Scanner scan = new Scanner(System.in); int sz = getArraySize(scan); String[] names = getNames(scan,sz); int which = getWhich(scan); String aName = getName(which,names); System.out.println("You choose name: "+aName); } public static int getArraySize(Scanner scan) { System.out.print("How many names? "); int n = scan.nextInt(); V not checked for type , length, format, or reasonableness. scan.nextLine(); return n; } public static String[] getNames(Scanner scan, int sz) { String[] names = new String[sz]; for (int i = 0; i < sz; i++ ){ System.out.print("type name # "+(i+1)+": "); names[i] = scan.nextLine();V – not checked for type (probably not integer), length, or reasonableness } return names; } public static int getWhich(Scanner scan) { System.out.print("Which name: "); int x = scan.nextInt();V – not checked for type, length, format, or reasonablness. return x; } public static String getName(int n,String[] vals) { return vals[n-1]; } } 1. Complete the following checklist for this program. 2. List the potential input validation errors. The index used returned by getWhich is not validated, Also, if a non-integer value is typed for prompts in getArraySize() and getWhich(), an exception will be thrown. The value returned by getArraySize() is not checked for reasonableness – it might be absurdly large. 3. Provide example inputs that might cause validation problems, and describe the problems that they might cause. If the number typed for getWhich() is greater than the number provided for getArraySize(), or it is less than zero the value passed to getName() will be out of bounds, and an ArrayIndexException will be thrown. See next question as well. 4. What happens if you type non-numeric characters for either the number of names or which name you wanted to retrieve? An Exception will be thrown. 5. Revise the program to properly validate input and gracefully recover from errors. import java.util.*; public class Input2 { public static void main(String[] args) { Scanner scan = new Scanner(System.in); int sz = getArraySize(scan); String[] names = getNames(scan,sz); int which = getWhich(scan,names.length); String aName = getName(which,names); System.out.println("You choose name: "+aName); } public static int getArraySize(Scanner scan) { int n = -1; while ( n < 0) { try { System.out.print("How many names? "); n = scan.nextInt(); scan.nextLine(); } catch(InputMismatchException e) { System.out.println("Please type an integer"); scan.nextLine(); } } return n; } public static String[] getNames(Scanner scan, int sz) { String[] names = new String[sz]; for (int i = 0; i < sz; i++ ){ System.out.print("type name # "+(i+1)+": "); names[i] = scan.nextLine(); } return names; } public static int getWhich(Scanner scan,int length) { int x = -1; while ( x <1 || x >length) { try { System.out.print("Which name: "); x = scan.nextInt(); scan.nextLine(); } catch(InputMismatchException e) { System.out.println("Please type an integer value"); scan.nextLine(); } } return x; } public static String getName(int n,String[] vals) { if (n >=1 && n <= vals.length) { return vals[n-1]; } else { return ""; } } } 6. Input validation can often be particularly challenging for personal information. Imagine you're writing a program that will help users of a web site make a purchase. To do this, your program will ask them for credit card information. The credit card information will contain a 16 digit credit card number, the month and year of expiration, and a threedigit verification code. Write a program that will ask the user to type all of these values. Your program should validate each piece of input provided, and ask the user to retype the required data if necessary. Try to allow for as much flexibility in formatting as possible. import java.util.Scanner; public class CreditCardInformation { public static void main(String[] args) { Scanner scan = new Scanner(System.in); boolean valid = false; String cardNo=""; String month=""; String year=""; String verification=""; boolean cardNumberValid = false; boolean monthValid = false; boolean yearValid = false; boolean verificationValid = false; while (valid == false) { if (cardNumberValid == false) { cardNo = getCreditCardNumber(scan); } if (monthValid == false) { month = getMonth(scan); } if (yearValid == false) { year= getYear(scan); } if (verificationValid == false) { verification = getVerificationNumber(scan); } cardNumberValid = validateCreditCardNumber(cardNo); monthValid= validateMonth(month); yearValid = validateYear(year); verificationValid = validateVerification(verification); valid = cardNumberValid == true && monthValid == true && yearValid == true && verificationValid == true; if (valid == false) { String s =""; if (cardNumberValid == false) { s= s+"-The credit card number must be 16 digits.\n"; } if (monthValid == false) { s= s+"-The month must be either the name of a month, \nthe three letter abbreviation for a month, or a digit from 1-12.\n"; } if (yearValid == false) { s= s+"-The year must be sometime equal to or later than 2009."; } if (verificationValid == false) { s=s+"-The verification number must be a 3 digit integer"; } System.out.println("-There were some errors in your input. "); System.out.println(s); System.out.println("-Please retype the appropriate fields"); } } System.out.println("Card Information: "); System.out.println("Number: "+cardNo); System.out.println("Expiration: "+month+"-"+year); System.out.println("Verification: "+verification); } public static String getCreditCardNumber(Scanner s) { System.out.print("Please type the credit card number: "); String cc = s.nextLine(); return cc; } public static String getMonth(Scanner s) { System.out.print("Please type the month: "); String m = s.nextLine(); return m; } public static String getYear(Scanner s) { System.out.print("Please type the year: "); String y = s.nextLine(); return y; } public static String getVerificationNumber(Scanner s) { System.out.print("Please type the verification number: "); String y = s.nextLine(); return y; } public static boolean validateCreditCardNumber(String cardNo) { String s = ""; for (int i = 0; i < cardNo.length(); i++) { char c = cardNo.charAt(i); if (c == ' ' || c == '-') { // skip space or dash continue; } if (c < '0' || c > '9') { // if it's otherwsise not a number, it's not valid return false; } s+=c; //add on digit } if (s.length() != 16) { // if too short, no good. return false; } return true; } public static boolean validateMonth(String month) { String[] months = {"january","february","march","april","may","june","july", "august","september","october","november","december"}; //first, see if it's a number try { int m = Integer.parseInt(month); // if it doesn't throw an exception, let's see if it's between 1 and 12. If so, it's ok. //if not, noo good. if (m >=1 && m <=12) { return true; } else { return false; } } catch (NumberFormatException e) { // do nothing, because it can still be valid. } // must match one of the fullMonths or the first 3 letters as a abbreviation for (int i =0; i < months.length; i++) { if (month.compareToIgnoreCase(months[i]) == 0 ) { // matched return true; } String prefix = months[i].substring(0,3); if (month.compareToIgnoreCase(prefix) == 0) { return true; } } // didn't match a month. must be no good return false; } public static boolean validateYear(String year) { try { // make it four numbers instead of two // so, 09 becomes 2009. // if i have two digits that are alpha, get // 20ab instead of ab. Still won't parse... if (year.length() == 2) { year = "20"+year; } int y = Integer.parseInt(year); if (y < 2009) { // years that have passed are no good return false; } else { return true; } } catch (NumberFormatException e) { // if it's not a number, it's not valid return false; } } public static boolean validateVerification(String verification) { try { int v = Integer.parseInt(verification); // must be a 3 digit number if (v < 0 || v > 999) { return false; } else { return true; } } catch (NumberFormatException e) { // not valid if it's not a number return false; } } } Note that still other checks are possible. We might compare the expiration month and date to the current month and date, verifying that they are in the future, and we might strip out all spaces and punctuation from the credit card number to convert it into a simple string of 16 digits. We might also use publicly available information about the structure of credit card numbers to verify that a provided number is plausible. See http://en.wikipedia.org/wiki/Credit_card_numbers for information on credit card numbers. Security Checklist: Security Checklist Vulnerability Input Validation Course CS2 Task – Check each line of code Completed 1. Mark each variable that receives external input with a V For each statement that is marked with a V, verify that the variable is checked for each of these criteria. Note any that it is not checked for 1. Length 2. Range (reasonableness?) 3. Format 4. Type Shaded areas indicate vulnerabilities! Discussion Questions: 1. You're writing a program that asks the user to type in a telephone number. How might you validate that the characters that they've typed represent a legal telephone number? You should assume that you're only concerned about phone numbers from the US, but you want to give users as much flexibility as possible, in terms of spaces and punctuation characters. List some rules that you might use. Make sure that you complete this question before moving on to question #2. 1. Verify that there are 10 digits 2. Remove any parentheses, dashes, or spaces 2. Find an example of a phone number that doesn't fit your rules. Anything that requires a leading 1 – as in 1 410 555 1212 Any number specified with a “+” at the beginning: +1 410 555 1212 3. Describe either an example of an input validation problem that you may have encountered. If you can't remember having any sort of problem, try some web pages or other software tools – try to find a system that fails to validate input data correctly. Taking zip codes without verifying 5 digits, accepting dates that have already passed, improper formats for phone numbers, etc. 4. If input is sufficiently cryptic, it might be hard to provide useful error messages in responses to invalid input. Describe some strategies that might be used to help users recover from invalid input. Example formats indicating what correct inputs might look like, error messages that describe difficulties with input as provided. Flexible inputs that allow users to correct multiple errors with one screen – as opposed to fixing them one at a time.