Data validation Please use speaker notes for additional information! All data when it is initially entered into the system should be checked for errors so that bad data does not get put onto permanent disk files. Remember the rule: "Garbage in, garbage out!" This process of error checking is called VALIDATING OR EDITING. Data entered into the system Validate or edit program Good transactions Errors (transactions that contain errors) In this systems flowchart, I am showing data being entered from a screen. The validate/edit program is checking the data. Data that passes the tests will be written on the good transaction file. Data that contains errors are written to the screen. Other methodologies will be shown on the next few slides. Data being keyed in Keyed in data stored on disk Validate or edit program Errors (transactions that contain errors) Here I am showing transactions being keyed in and stored on disk with no editing happening - this is just data entry. Good transactions Data entered into the system Validate or edit program Errors (transactions that contain errors) Note the double arrow between the data being entered and the validate/edit program. This means that the data is being checked and feedback is going back to the person entering the data so they can correct errors. Good transactions Data entered into the system Validate or edit program Errors (transactions that contain errors) Good transactions Invalid transactions Can be viewed on the screen, corrected and then made a good transaction if they have no errors. This would involve additional processing. As can be seen, reporting can be an important part of editing. Both valid and invalid records can be written. Usually if you are reporting valid and invalid transactions, they are done on separate reports, but sometimes you will see reports that mix valid and invalid record reporting. The report can be done using a variety of styles depending on the needs of the users. The important thing is that on a report of valid transactions the entire record is printed if the purpose is a paper trail. On an error report, the reader must be able to identify the error so it can be fixed. The error report must contain: * the id# or some other identifying field from the record * the contents of the field that is in error * an error message that explains the error Examples of the kinds of errors that validating looks for: Validating presence of data: One common error is no data in a field where data is required. For example id is frequently required as is name, hours worked for a payroll problem etc. A pseudocode example testing for the presence of a name is shown below: Set invalid indicator to no prior to entering the validate routine Validate routine if name = spaces report missing name set invalid indicator to yes end if Validating data type: The biggest issue here is is non-numeric data in a numeric field. However you can also validate for character data in a character field. Pseudocode is shown below: Assume that payhr is a numeric field and I want to make sure that no non numeric data is entered in the field. Most languages have a way to ask if a field is numeric. Assume also that paycode should be an uppercase character field. Set invalid indicator to no prior to entering the validate routine Validate routine if payhr is not numeric report non numeric data in payhr set invalid indicator to yes end if if paycode < “A” OR paycode > “Z” report non uppercase character in paycode set invalid indicator to yes end if Validating data codes: The valid paycode may be S, F, P and those are the only codes you want entered in that field. Set invalid indicator to no prior to entering the validate routine Validate routine if paycode = “S” OR paycode = “F” or paycode = “P” no processing else report non uppercase character in paycode set invalid indicator to yes end if Validating data range: if paycode < “A” OR paycode > “Z” report non uppercase character in paycode set invalid indicator to yes end if In the top example, I am saying that anything that is outside of the range is an error. I am using an OR because if it is outside the range on either end it is a problem. This could not be an AND because a character cannot be outside the range on both ends. In the bottom example, I am saying that anything inside the range is valid and requires no processing. I am using the AND because both conditions must be true to make it inside the range. If either or both conditions are false, paycode is not in the range and I have an error. if paycode >= “A” AND paycode <= “Z” no processing else report non uppercase character in paycode set invalid indicator to yes end if N paycode < A N paycode > Z Y Y if paycode < “A” OR paycode > “Z” report non uppercase character in paycode set invalid indicator to yes end if Error & set ind Error & set ind N Error & set ind if paycode >= “A” AND paycode <= “Z” no processing else report non uppercase character in paycode set invalid indicator to yes end if paycode >= A N Error & set ind Y paycode <= Z Y Validating data range (another example): For this example, I want to make sure that the payhr is within the range of 10 to 25. Set invalid indicator to no prior to entering the validate routine Validate routine if payhr >= 10.00 AND payhr <= 25.00 no processing else report payhr out of range set invalid indicator to yes end if Set invalid indicator to no prior to entering the validate routine Validate routine if payhr < 10.00 OR payhr > 25.00 report payhr out of range set invalid indicator to yes end if Validating data range where the range is dependent on another field: For this example, I want to make sure that the payhr is within the range of 10 to 25 for employees with the paycode F. Set invalid indicator to no prior to entering the validate routine Validate routine if paycode = “F” AND (payhr >= 10.00 AND payhr <= 25.00) no processing else report payhr out of range set invalid indicator to yes end if Set invalid indicator to no prior to entering the validate routine Validate routine if paycode = “F” if payhr < 10.00 OR payhr > 25.00 report payhr out of range set invalid indicator to yes end if else Validating reasonableness and consistency: For this example, I am checking to see if the state is reasonable for the zipcode. If the zipcode is 02184 then the state must be MA. I will also check and see if the date of the payment is larger than today’s date (that will be shown on the next slide). Set invalid indicator to no prior to entering the validate routine Validate routine if zipcode = “02184” if state = “MA” no processing else report state inaccurate for zip code set invalid indicator to yes end if else ... Validating reasonableness and consistency: For this example, I will also check and see if the date of the payment is larger than today’s date. Set invalid indicator to no prior to entering the validate routine Validate routine if dateentered > todaysdate report date inaccurate set invalid indicator to yes end if else ... Validating group of fields together: For this example, I will also to see if an employee has worked 40 hours (nothing more and nothing less) when I look at the regular hours, the vacation hours and the sick hours. Set invalid indicator to no prior to entering the validate routine Validate routine emphrs = reghrs + vacahrs + sickhrs if emphrs not = 40 report error in employee hours worked set invalid indicator to yes end if else ... Validating fields together where some should be empty: In this example, if the code is S then there should be data in the salary field, but no data in the pay per hour field. Set invalid indicator to no prior to entering the validate routine Validate routine if paycode = “S” if salary > 0 if payhr = 0 or space no processing else report error in pay per hour set invalid indicator to yes end if else report error in salary set invalid indicator to yes end if else ... Validating accuracy: The biggest problem is validating accuracy within a valid range. This is almost impossible to do. For example, a payment sent in to a credit card company. If the person sent 120 and 210 was entered, this would be very difficult to catch. Batch processing can be used to check this type of data. Batch editing - in batch editing a group of transactions are grouped together as a batch, for example 20 transactions might be called a batch - each batch is given a number - before data entry, the batch of transactions are gathered and totals are run on significant numeric fields this might mean running a total on part number, on hand, cost etc. - as many totals can be gathered as needed - this total information is entered into a batch header along with the batch number - when the data is being keyed in, the batch header is keyed in followed by all of the transactions in the batch and then another batch header followed by its transactions - in the edit program the batch header is read and the totals on it are stored in memory, then the transactions are read one at a time and the same totals are accumulated (if you did part number, on hand and cost you would total the same three fields) - when a new batch header is read, it is the signal that the old batch is complete and the totals are compared - if the totals that were accumulated during processing do not match the totals from the batch header, the batch is considered to be unbalanced and the information is printed out - the advantage of this system is that the unbalanced batch only involves 20 transactions so finding the error or errors is a much less significant problem than searching for the errors on thousands of transactions Validating accuracy: Check Digit - a check digit is the calculated last digit of an identification type of number such as employee number or item number - for example, with an eight digit id # the first seven digits would be assigned and the eighth digit (the check digit) would be calculated using special formulas designed for this purpose - the eighth digit now becomes part of the id - any time the id is typed in the calculation can be redone on the first seven digits to see if the answer is the same as the eighth digit, if it is then the id is considered valid - this is a great technique for catching transposition of digits etc. ID: 5329012456 The check digit is created by running the rest of the id through a formula to produce the check digit before the ID issued. Every time a transaction with that ID is processed. The calculation is redone to make sure that the last digit is what it should be. Mainline Housekeeping Housekeeping Set up variables Process Open files End Housekeeping Wrapup Process Read a record Not EOF Y Process Record Loop N End Process End Mainline Wrapup Close Files End Wrapup Validate routine Process record loop N Y Name=spaces Set invalid indicator to no Write to error report Validate Routine N invalid indicator = no Set invalid indicator to yes Y Write to good transactions Read data to edit N Y Amt > 5000 Write to error report Set invalid indicator to yes End Process record loop End Validate routine