Input Validation – “All input is evil” CS1 Background Summary: Any input that comes into a program from an external source – such as a user typing at a keyboard or a network connection – can potentially be the source of security concerns and potentially disastrous bugs. All input should be treated as potentially dangerous Description: All interesting software packages rely upon external input. Although information typed at a computer might be the most familiar, networks and external devices can also send data to a program. Generally, this data will be of a specific type: for example, a user interface that requests a person’s name might be written to expect a series of alphabetic characters. If the correct type and form of data is provided, the program might work fine. However, if programs are not carefully written, attackers can construct inputs that can cause malicious code to be executed. Risk – How can It happen? Any data that can enter your program from an external source can be a potential source of problems. If external data is not checked to verify that it has the right type of information, the right amount of information, and the right structure of information, it can cause problems. Input that is not properly validated can impact any type of computer program, from word processors to web servers and relational databases. Example of Occurrence: The Risks digest (http://catless.ncl.ac.uk/Risks ) - an invaluable resource on computing systems gone wrong – carried a report of an electronic commerce web site that failed to verify the quantity of items ordered. After accidentally typing “1.1” for the desired quantity of an item (instead of one), an amused customer found that the system would let him order 1.1 cocktail shakers at $9.99 each, for a total of $10.99. A simple check to verify that the quantity was an integer value would have eliminated the absurd possibility of ordering one-tenth of a cocktail shaker. Source: Richard Kaszeta, “Lack of sanity checking in Web shopping cart software “ Risks Digest, 23(51) http://catless.ncl.ac.uk/Risks/23.51.html#subj11 Example in Code: This program asks the user to type in an even number. It will then print all of the even numbers that are greater than or equal to zero and less than the number typed: import java.util.Scanner; public class InputValidationExample { public static void main(String[] args) { Scanner scan = new Scanner(System.in); System.out.println("Please type an even number: "); int x = scan.nextInt(); for (int i = 0; i != x; i+=2) { System.out.println(i); } } } This code has two input validation problems. The first involves the use of the Scanner to get an integer as typed by the user. If the user types an integer, this will work without any problems. However, if a floating point value (such as “3.2”) or a string (such as “Hello”) are typed, Java will throw an exception. A robust program would catch this error, provide a clear and appropriate error message, and ask the person to re-type their input. The second problem involves the lack of validation of the requirement that the integer be even. If the user types an even number, this program will run perfectly well. However, if an odd number is provided, there will be trouble: as the counter i starts from zero and increases by two with each iteration, it will never be equal to an odd number, and the loop will not terminate. As a careful developer, you should use a two-part strategy to avoid this second problem. In the first part, you should verify that the number provided by the user is indeed even. If it is not, your program should return an error message and repeat the request, not proceeding until an even number is provided. The second approach involves revising the loop. The loop above will continue as long as the loop counter i is not equal to the x, the value provided. A more robust solution would be to continue the loop as long as the i is less than x. Thus, even if there was a problem with your input validation, the loop would still stop. How can I avoid input validation problems? Check your input: The basic rule is for input validation is to check that input data matches all of the constraints that it must meet to be used correctly in the given circumstance. In many cases, this can be very difficult: confirming that a set of digits is, in fact, a telephone number may require consideration of the many differing phone number formats used by countries around the world. Some of the checks that you might want to use include: data type, range of values, length of input, and format. If you ask for a date and someone gives you a twelve digit number, it's probably wrong. Other places where you might run into input validation problems including accessing items in an array or getting substrings out of a string: if you access indices in an array or positions in a string that are outside of the limits of the array (or string), you may run into trouble. Some programming languages have tools that provide general input validation support or specific support for handling common input formats. These facilities should be used whenever possible. Recover Appropriately: A robust program will respond to invalid input in a manner that is appropriate, correct, and secure. When your program runs across invalid input, it should recover as much as possible, and then repeat the request, or otherwise continue on. Arbitrary decisions such as truncating or otherwise reformatting data to “make it fit” should be avoided. Laboratory/Homework Assignment: Consider this problem, which asks the user to type in a string and a position in that string. This program uses the substring method to get a substring of the characters in the string, starting with the given index and going to the end of the string. Thus, if the user types “Hello, World”, and the integer 7, the program will print “World” as the substring, because “World” starts at position 7. import java.util.*; public class UnvalidatedInput { public static void main(String[] args) { Scanner scan = new Scanner(System.in); System.out.print("Please enter a string: "); String s = scan.nextLine(); System.out.print("Please enter a starting position: "); int x = scan.nextInt(); String sub= s.substring(x); System.out.println("The substring starting from "+x+" is "+sub); } } 1. Complete the following checklist for this program. 2. List the potential input validation errors. 3. Provide example inputs that might cause validation problems and describe the problems that they might cause. 4. What happens if you type non-numeric characters for the starting position? 5. Write a program that asks a user for their day, month, and year of birth. Make sure that each of these values are validated appropriately. Security Checklist: Security Checklist Vulnerability Input Validation Course CS2 Task – Check each line of code 1. Mark each variable declaration with a V 2.Mark with all external inputs to these variables with a V 3.Identify all uses of these variables that might lead to problems if the input is not validated. Mark them with a V. Shaded areas indicate vulnerabilities! Completed Discussion Questions: 1. Describe either an example of an input validation problem that you may have encountered. If you can't remember having any sort of problem, try some web pages or other software tools – try to find a system that fails to validate input data correctly 2. Imagine having the chance to talk to the folks who built the system that you identified (in question 1) as having an input validation problem. What might you suggest that they do to fix this problem? Are there multiple approaches that they might use? 3. In problem 5 above, you were asked to validate the year of an individual's birth. What assumptions does your code make about birthdates? How would your program differ if you were dealing with historical figures, or people who lived more than two thousand years ago?