Program Development May 06 Data Validation Data Validation • All presented data should be fully correct and complete…ideally • Problems of illegible entries alleviated by:– – – – better form design typing instead of handwriting scanning external documents can be vetted first only if they are turnaround documents and sent to external sources for approval – otherwise external documents difficult to control May 06 Data Validation – practically - not always achieved • must try to detect errors early on • Check external data more rigorously as it is harder to control • Main types of error – source read, data preparation, incorrect batches of input, missing data, duplicated data, incorrect file records. 1 May 06 Data Validation Data Validation Data Validation – Looking up record to insure that data exists. • Different Types:- • Before processing order, check to see customer number is correct. – Input Validation, Feasibility Checking, Check Digits – By discarding inputs for non-existing data, minimise a number of difficulties. – Reports after batch processing to show anomalies/error. – Validity checks reduce confusion and backtracking. • Input Validation – More of an absolute proof than feasibility checking. – System checks the data against known values. May 06 Data Validation 3 May 06 Data Validation Data Validation • Feasibility Checking • Respond to input by returning data that confirms or strongly implies the the accuracy of the input. • Input of account number results in account holder name being returned. • Provides confirmation of correctness. Data Validation Damien Costello, Dept of Computing & Maths, GMIT 4 Data Validation – Flashback checks / Echo checks May 06 2 – Look for possible and definite errors. – The following types of checks are defined:• • • • 5 May 06 picture checks, limit checks, fragmented limit check restricted value check, combination checks compatibility checks, probability check dependency check, check digits Data Validation 6 1 Program Development May 06 Data Validation Data Validation • Limit Checks • Picture Checks – Every data item has max and min value. – Data item’s picture is checked against the defined picture. – Any difference results in rejection. – Should be only one picture for a data item. • input, output, intermediate stage of processing – Limits may be set by the width of the data field. • 4 digit number field - 0000 - 9999 – Sometimes a check against acceptable limits is used. • in some cases several are applicable (number plates) – Usually applied to code numbers or dates. May 06 Data Validation • Fragmented Limit Check - sub range. – If code starts with D, following must be 1200 - 3000. 7 May 06 Data Validation Data Validation Data Validation • Restricted Value • Combination Checks – Data item can only have pre-determined values. – Commodity sold in round dozens - order size must be multiple of 12. – Bypassed by using drop down list box allowing the user to pick only pre-determined values. – Data items may pass their own limit checks but become unfeasible when combined with others. – Example • • • • • Combination Checks – Combination - joining of two or more data items. May 06 Data Validation 9 May 06 Data Validation 10 – For any of the previous checks, the result is judged against a probability table of the data being erroneous. – Minimise investigative and corrective work following the detection of a possible error. • customer orders £50 worth of goods - acceptable • pays by credit card (limit is exceeded) • order cannot be processed - compatibility breach Damien Costello, Dept of Computing & Maths, GMIT Data Validation • Probability Checks – Two or more data items are checked for mutual consistency. – Parameters of check on one data item are imposed by the other. – example Data Validation order quantity limit is 50 unit prices range from £1 - £80 business rule - orders from £1 - £100 accepted order of 50 for £80 passes individual limit check on order but fails combination check for total price Data Validation • Compatibility Checks May 06 8 11 May 06 • Data just outside limits would be reported but allowed to proceed. • Data with greater variance results in suspension pending investigation. Data Validation 12 2 Program Development May 06 Data Validation Data Validation • Dependency Check • Check Digits – If one data item is present, then so must another. – Use of check digits depends on the length of code number:• Code longer than 5 digits have a greater possibility for error, use check digits. • Check digits (self-checking numbers) – Appended to a code number in order to detect errors arising when the number is transcribed manually. – Detects high proportion of errors. May 06 Data Validation – Types of errors 13 May 06 • transcription - copying • transposition - two adjacent digits are interchanged • others - single & multiple shifts, double transpositions, insertion & deletion of digits Data Validation Data Validation Data Validation • Modulus 11 Check Digit system • Modulus 11 – most commonly used – provides high level of security – how does it work? May 06 Data Validation – multiply each digit in the code by its weight (for LSD, 2 for next is 3 etc) – add the above products – divide this sum by 11 – if the remainder is 0, the check digit is also 0 – if remainder is not 0, subtract it from 11 to give check digit 15 May 06 Data Validation Data Validation May 06 • Example the check digit is given a weighting of 1 multiply each digit in the code by its weight add together all the products divide this by 11 if the remainder is 0, then code number is correct Data Validation Damien Costello, Dept of Computing & Maths, GMIT 16 Data Validation • Validating a Modulus 11 number – – – – – 14 – code number 27935 Code Weights Products Sum Sum/11 Check Digit New Code 17 May 06 2 7 9 3 5 6 5 4 3 2 12 35 36 9 10 102 9 remainder 3 11 - 3 = 8 279358 Data Validation 18 3 Program Development May 06 Data Validation Data Validation • The larger the divisor (and hence the remainder), the greater the systems degree of security. • 273958 verifying the code number Code Weights Products Sum Sum/11 2 7 9 3 5 6 5 4 3 2 12 35 36 9 10 110 10 remainder 0 8 1 8 – convenient to use 23 as divisor • largest prime below 26 • gives remainder of 0-22 which can be expressed as letters of the alphabet • keeps the overall length of the code down Check Digit is correct, therefore the code number is valid May 06 Data Validation 19 – Few other check digit systems are put into practical use. May 06 Data Validation Data Validation 20 Data Validation • Need a smooth means of dealing with errors & omissions. • Number of ways:- – accept, flag and allow through with no alteration • corrected by error correction run – reject data and/or transaction, sending a message to originator • log erroneous data for later checking – accept item, but flag as erroneous • dealt with by procedure such as correction by special operators using VDU terminals May 06 Data Validation Damien Costello, Dept of Computing & Maths, GMIT 21 May 06 Data Validation 22 4