Uploaded by gummanampati

VL ETL Testing Concepts

advertisement
ETL TESTING CONCEPTS
1
Prerequisite For Learning ETL Testing
1. Experience on Software Testing
2. Good Knowledge on Database (Oracle or SQL Server)
3. Good knowledge on UNIX or LINUX commands
4. Basic knowledge on ETL Tool (Informatica, Data Stage and
ODI etc…
TESTING ENVIRONMENT FOR ETL
1. LINUX /UNIX Load Server
2. ETL Tool
3. WINSCP
4. PUTTY
5. Data Warehouse DB (NETEZZA/TERA DATA)
TEST DATA PREPARATION
 MISSING DATA (NULL VALUES)
 DATA TYPE VALIDATION ( PASSING NON NUMERIC ON NUMERIC FIELD)
 FIELD LENGTH VALIDATAIONS ( PASSING VALUE FIELD LENGTH > DEFINED
FIELD LENGH)
 NO.OF COLUMNS VALIDATION ( PASSING RECORD WITH MORE NO.OF COLUMN
VALUES THAN DEFINED COLUMNS ON TABLE)
 DATE FORMATS (ACTUAL DATE :DD/MM/YYYY) DATA PASSED AS YYYY/DD/MM
 SSN WITH SEPERATORS AND DATA PASSED WITH OUT SEPERATORS 123-45- 2389 DATA
PASSED AS 123452389)
 PADDING OF DATA
 ORDER OF SEQUENCE FOR COLUMNS
 JUNK DATA (?,'' and special characters)
 DATE FIELD SAYS DOB NOT > CURRENT DATE . BUT WE PASS TOMORROW'S DATE.
CONSTRAINT TESTING
During constraint testing, the objective is to validate unique
constraints, primary keys, foreign keys, indexes, and relationships. The
test script should include these validation points. Some ETL processes can
be developed to validate constraints during the loading of the warehouse.





All table names
Column names for each table.
Column types for each table.
Whether a column allows NULL or not.
Default definitions.
SOURCE TO TARGET COUNT TESTING
The objective of the count test scripts is to determine if the record
counts in the source match the record counts in the target. Some ETL processes
are capable of capturing record count information such as records read, records
written, records in error, etc. If the ETL process used can capture that level of
detail and create a list of the counts, allow it to do so. This will save time during
the validation process. It is always a good practice to use queries to double
check the source to target counts.
Example:
1. Record count verification in Source as per the business rule.
SELECT COUNT(*) FROM
(SELECT DEPTNO,AVG(SAL),MAX(SAL),MIN(SAL),SUM(SAL) FROM EMP
GROUP BY DEPTNO)
2. Record count verification in Target
SELECT COUNT(* ) FROM TRG_AGGR
SOURCE TO TARGET DATA VALIDATION TESTING
No ETL process is smart enough to perform source to target field-to-field
validation. This piece of the testing cycle is the most intensive and requires the most
thorough analysis of the data.
Example:
Data verification in Source as per the business rule.
SELECT DEPTNO,AVG(SAL),MAX(SAL),MIN(SAL),SUM(SAL) FROM EMP
GROUP BY DEPTNO
Data verification in Target
SELECT * FROM TRG_AGGR
SELECT * FROM EMP
MINUS
SELECT * FROM TRG_EMP;
SELECT COUNT(*) FROM EMP
MINUS
SELECT COUNT(*) FROM TRG_EMP;
DATA VALIDATION TESTING
1. Verify that data is transformed correctly according to various
business requirements and rules.
2. Make sure that all projected data is loaded into the data warehouse
without any data loss and truncation.
3. Make sure that ETL application appropriately rejects, replaces with
default values and reports invalid data.
4. Make sure that data is loaded in data warehouse within prescribed
and expected time frames to confirm improved performance and
scalability.
BATCH SEQUENCE & DEPENDENCY TESTING
ETL’s in DW are essentially a sequence of processes that
execute in a particular sequence. Dependencies do exist among various
processes and the same is critical to maintain the integrity of the data.
Executing the sequences in a wrong order might result in inaccurate
data in the warehouse. The testing process must include at least 2
iterations of the end–end execution of the whole batch sequence. Data
must be checked for its integrity during this testing. The most common
type of errors caused because of incorrect sequence is the referential
integrity failures, incorrect end-dating (if applicable) etc, reject
records etc.
JOB RESTART TESTING
In a real production environment, the ETL jobs/processes fail
because of number of reasons (say for ex: database related failures,
connectivity failures etc). The jobs can fail half/partly executed. A good
design always allows for a restart ability of the jobs from the failure
point. Although this is more of a design suggestion/approach, it is
suggested that every ETL job is built and tested for restart capability.
DUPLICATE TESTING
Duplicate Testing must be performed at each stage of the ETL
process and in the final target table. This testing involves checks for
duplicates rows and also checks for multiple rows with same primary
key, both of which cannot be allowed.
PERFORMANCE TESTING
It is the most important aspect after data validation.
Performance testing should check if the ETL process is completing
within the load window.
VOLUME TESTING
Verify that the system can process the maximum expected
quantity of data for a given cycle in the time expected.
NEGATIVE TESTING
Negative Testing checks whether the application fails and
where it should fail with invalid inputs and out of boundary scenarios
and to check the behavior of the application.
INTIAL & INCREMENTAL LOAD TESTING
Type 1 - Update a source value. Run ETL process. Verified change was applied.
Type 2 - Update a source value. Run ETL process. Verified change was applied
to new row and old row was end dated.
INCREMENTAL & SCD TYPE2 IMPLEMENTATION TESTING
There are number of universal verifications that have to be carried out for any
kind of data warehouse testing. Below is the list of objects that are treated as
essential for validation in ETL testing:
- Verify that data transformation from source to destination works as expected
- Verify that expected data is added in target system
- Verify that all DB fields and field data is loaded without any truncation
- Verify data checksum for record count match
- Verify that for rejected data proper error logs are generated with all details
- Verify NULL value fields
- Verify that duplicate data is not loaded
- Verify data integrity
ETL Testing Challenges:
ETL testing is quite different from conventional testing. There are many
challenges we faced while performing data warehouse testing. Here is the list of
few ETL testing challenges I experienced on my project:
- Incompatible and duplicate data.
- Loss of data during ETL process.
- Unavailability of inclusive test bed.
- Testers have no privileges to execute ETL jobs by their own.
- Volume and complexity of data is very huge.
- Fault in business process and procedures.
- Trouble acquiring and building test data.
- Missing business flow information.
Incremental(Delta) Load Test
Tootia on Tue Aug 30, 2011 12:00 am
Hi all
It is my first post in this forum, we have recently started to implement a dimensional data warehouse for our
organisation and I am the main person in charge of implementing the ETL jobs. We use SAP BO DataServices as the
ETL tool. My question is what is the best and efficient way of testing Delta load (whether Dimension or Fact tables)?
As each table has got its corresponding SQL script (as mapping rules), it is easy to test the Initial load. We just check
the target table in DW against its SQL script in a test case (written and run in MS SQL Management Studio). However
testing the Delta load is time consuming and not straight forward as Initial load testing.
For each job usually there are different tables which contributes in loading data into DW table. In the Delta load job, I
have a condition for each source table to see if the record modified Date is greater than the last ETL load, the record
should be picked up and contributed in the load, otherwise not.
To fully test the job, I should insert and update a record into each source table, run the job and see whether the
record has been passed through (affected the target record) or not.
If we have only 4 tables as the source of a job, we need to have at least 8 test cases to test the Delta load. One test
case for Insert a new record in each table and one test case for update an existing record in source table. 4 * 2 = 8
test cases.
That's why I found out testing the Delta load tedious and frustrating. I'm wondering if there is any better way to fully
test the ETL job for Incremental load, rather than the way I explained above.
Download