Assignment: ETL The In-Class Exercise involved a scenario where

advertisement
Assignment: ETL
The In-Class Exercise involved a scenario where you brought together two different data sets from two
sources. Each data set contained a group of orders by a group of customers, and those customers did
not overlap (no customer was in both data sets).
For this assignment, you’ll be building on that data set by adding new fields to the “Full Set” worksheet.
Instead of adding new rows, this time you’ll be adding new columns. The data will come from the
“Source 3” worksheet (also in the workbook).
Guidelines




You must submit (1) the answers to the questions on the last page of this document, and (2) the
final version of your “ETL Exercise.xlsx” worksheet.
You must include your name at the top of the document.
Your answers should be emailed, as an attachment, to your instructor and the TA
(tuc22942@temple.edu) with the subject:
MIS2502: ETL Assignment
The email must be sent by the start of class the day the assignment is due.
If you do not follow these instructions, your assignment will be counted late.
Evaluation
Your submission will be graded based on two factors:


The correctness of the answers to the questions.
The accuracy of the “Full Set” worksheet in the “ETL Exercise.xlsx” workbook.
Part 1: Credit Line field
Add the data for credit line to the “Full Set” worksheet. A minimum credit line of $2,000 has been
established, so that even if the customer has a credit line of $0 it is changed to $2,000. Use the
VLOOKUP() function to put this data into the “Full Set” worksheet. You’ll notice that even if you do it
correctly, there will be some errors (“N/A” values).
Question 1: Which customer doesn’t have data when you apply the VLOOKUP() function?
Question 2: Explain why this is causing a problem.
Now make the necessary change to the Source 3 worksheet to correct the issue so that Credit Line data
appears for all the customers.
Part 2: Missed Payments field
Add the data for the credit line to the “Full Set” worksheet. In the Source 3 worksheet, if a customer has
no missed payments, their value for that field is “NONE”.
First, transform the data in the Source 3 worksheet by creating a formula for the “Missed Payments 2”
column. That column should only have numeric data (you can use the IF() function to do this – use the
explanation of the IF() function below and the example in the “New Credit Line” column as a guide).
Once you do the transformation, use the VLOOKUP() function to bring the data in the “Missed Payments
2” column into the “Full Set” worksheet.
Question 3: Write the data transformation rule for the missed payment field (not the syntax of the IF()
function; just explain the criteria you used to transform the data).
Part 3: Country Field
Add the data for the country field to the “Full Set” worksheet. Notice that the United States is
represented several different ways. Choose one, and transform the remaining data so that the value for
the United States is consistent across all customers. Use the “Country 2” column to hold the
transformed data. Use an IF() statement to transform the data. Then use the VLOOKUP() function to
bring the data into the “Full Set” worksheet.
Question 4: Write the data transformation rule for the country field (not the syntax of the IF() function;
just explain the criteria you used to transform the data).
Looking at the IF() function:


IF(logical_test, value_if_true, value_if_false) is an Excel function that places a value in a cell
depending on the outcome of a test.
So let’s say you had the function =IF(A2=”HELLO”,”HI THERE”,”GOODBYE”) in cell B2. That says the
following: if A2 contains the word “HELLO,” put “HI THERE” in cell B2. Otherwise, put “GOODBYE” in
cell B2.
Using the IF() function with OR():


You can test for several things at once using OR().
For example, now let’s say you had =IF(OR(A2=”HELLO”,A2=”HI”),”HI THERE”,”GOODBYE”) in cell B2.
In this case, if A2 contains the word “HELLO” or “HI” then “HI THERE” is put in cell B2. Otherwise, put
“GOODBYE” in cell B2.
NAME: _______________________________________
Answer Sheet
(enter your answers below and submit this along with the completed Excel workbook)
Question 1: Which customer doesn’t have data when you apply the VLOOKUP() function?
Question 2: Explain why this is causing a problem.
Question 3: Write the data transformation rule for the missed payment field (not the syntax of the IF()
function; just explain the criteria you used to transform the data).
Question 4: Write the data transformation rule for the country field (not the syntax of the IF() function;
just explain the criteria you used to transform the data).
Download