Data Validation

advertisement
Program Development
May 06
Data Validation
Data Validation
• All presented data should be fully correct and
complete…ideally
• Problems of illegible entries alleviated by:–
–
–
–
better form design
typing instead of handwriting
scanning
external documents can be vetted first only if
they are turnaround documents and sent to
external sources for approval
– otherwise external documents difficult to
control
May 06
Data Validation
– practically - not always achieved
• must try to detect errors early on
• Check external data more rigorously as it is harder to control
• Main types of error
– source read, data preparation, incorrect batches of
input, missing data, duplicated data, incorrect file
records.
1
May 06
Data Validation
Data Validation
Data Validation
– Looking up record to insure that data exists.
• Different Types:-
• Before processing order, check to see customer
number is correct.
– Input Validation, Feasibility Checking, Check
Digits
– By discarding inputs for non-existing data,
minimise a number of difficulties.
– Reports after batch processing to show
anomalies/error.
– Validity checks reduce confusion and
backtracking.
• Input Validation
– More of an absolute proof than feasibility
checking.
– System checks the data against known values.
May 06
Data Validation
3
May 06
Data Validation
Data Validation
• Feasibility Checking
• Respond to input by returning data that confirms or
strongly implies the the accuracy of the input.
• Input of account number results in account holder
name being returned.
• Provides confirmation of correctness.
Data Validation
Damien Costello, Dept of Computing &
Maths, GMIT
4
Data Validation
– Flashback checks / Echo checks
May 06
2
– Look for possible and definite errors.
– The following types of checks are defined:•
•
•
•
5
May 06
picture checks, limit checks, fragmented limit check
restricted value check, combination checks
compatibility checks, probability check
dependency check, check digits
Data Validation
6
1
Program Development
May 06
Data Validation
Data Validation
• Limit Checks
• Picture Checks
– Every data item has max and min value.
– Data item’s picture is checked against the
defined picture.
– Any difference results in rejection.
– Should be only one picture for a data item.
• input, output, intermediate stage of processing
– Limits may be set by the width of the data field.
• 4 digit number field - 0000 - 9999
– Sometimes a check against acceptable limits is used.
• in some cases several are applicable (number plates)
– Usually applied to code numbers or dates.
May 06
Data Validation
• Fragmented Limit Check - sub range.
– If code starts with D, following must be 1200 - 3000.
7
May 06
Data Validation
Data Validation
Data Validation
• Restricted Value
• Combination Checks
– Data item can only have pre-determined values.
– Commodity sold in round dozens - order size
must be multiple of 12.
– Bypassed by using drop down list box allowing
the user to pick only pre-determined values.
– Data items may pass their own limit checks but
become unfeasible when combined with others.
– Example
•
•
•
•
• Combination Checks
– Combination - joining of two or more data
items.
May 06
Data Validation
9
May 06
Data Validation
10
– For any of the previous checks, the result is
judged against a probability table of the data
being erroneous.
– Minimise investigative and corrective work
following the detection of a possible error.
• customer orders £50 worth of goods - acceptable
• pays by credit card (limit is exceeded)
• order cannot be processed - compatibility breach
Damien Costello, Dept of Computing &
Maths, GMIT
Data Validation
• Probability Checks
– Two or more data items are checked for mutual
consistency.
– Parameters of check on one data item are
imposed by the other.
– example
Data Validation
order quantity limit is 50
unit prices range from £1 - £80
business rule - orders from £1 - £100 accepted
order of 50 for £80 passes individual limit check on
order but fails combination check for total price
Data Validation
• Compatibility Checks
May 06
8
11
May 06
• Data just outside limits would be reported but
allowed to proceed.
• Data with greater variance results in suspension
pending investigation.
Data Validation
12
2
Program Development
May 06
Data Validation
Data Validation
• Dependency Check
• Check Digits
– If one data item is present, then so must
another.
– Use of check digits depends on the length of
code number:• Code longer than 5 digits have a greater possibility
for error, use check digits.
• Check digits (self-checking numbers)
– Appended to a code number in order to detect
errors arising when the number is transcribed
manually.
– Detects high proportion of errors.
May 06
Data Validation
– Types of errors
13
May 06
• transcription - copying
• transposition - two adjacent digits are interchanged
• others - single & multiple shifts, double
transpositions, insertion & deletion of digits
Data Validation
Data Validation
Data Validation
• Modulus 11 Check Digit system
• Modulus 11
– most commonly used
– provides high level of security
– how does it work?
May 06
Data Validation
– multiply each digit in the code by its weight
(for LSD, 2 for next is 3 etc)
– add the above products
– divide this sum by 11
– if the remainder is 0, the check digit is also 0
– if remainder is not 0, subtract it from 11 to give
check digit
15
May 06
Data Validation
Data Validation
May 06
• Example
the check digit is given a weighting of 1
multiply each digit in the code by its weight
add together all the products
divide this by 11
if the remainder is 0, then code number is
correct
Data Validation
Damien Costello, Dept of Computing &
Maths, GMIT
16
Data Validation
• Validating a Modulus 11 number
–
–
–
–
–
14
– code number 27935
Code
Weights
Products
Sum
Sum/11
Check Digit
New Code
17
May 06
2 7 9 3 5
6 5 4 3 2
12 35 36 9 10
102
9 remainder 3
11 - 3 = 8
279358
Data Validation
18
3
Program Development
May 06
Data Validation
Data Validation
• The larger the divisor (and hence the remainder),
the greater the systems degree of security.
• 273958 verifying the code number
Code
Weights
Products
Sum
Sum/11
2 7 9 3 5
6 5 4 3 2
12 35 36 9 10
110
10 remainder 0
8
1
8
– convenient to use 23 as divisor
• largest prime below 26
• gives remainder of 0-22 which can be expressed as letters of
the alphabet
• keeps the overall length of the code down
Check Digit is correct, therefore the code number is valid
May 06
Data Validation
19
– Few other check digit systems are put into practical use.
May 06
Data Validation
Data Validation
20
Data Validation
• Need a smooth means of dealing with errors
& omissions.
• Number of ways:-
– accept, flag and allow through with no
alteration
• corrected by error correction run
– reject data and/or transaction, sending a
message to originator
• log erroneous data for later checking
– accept item, but flag as erroneous
• dealt with by procedure such as correction by
special operators using VDU terminals
May 06
Data Validation
Damien Costello, Dept of Computing &
Maths, GMIT
21
May 06
Data Validation
22
4
Download