Uploaded by Kylie Kong

SQL Training

advertisement
Introduction to Writing SQL
This 7 week course is intended as a self-guided training course for individuals or groups who want to learn how to write custom SQL
queries against the Data Warehouse.
The textbook for the class is Mastering Oracle SQL, 2nd Edition by Sanjay Mishra & Alan Beaulieu (ISBN: 0596006322), and the SQL
portion of the class follows the book closely.
Each week, you’ll read one chapter and do a short homework assignment to review & practice what you learned. Not doing this
reading & homework will jeopardize your ability to follow the course, as each week builds on those before. It’s also recommended
you keep a business question specific to your role in mind, and try answering it as you develop your SQL & ETL skills.
Section
SQL Topics
Week 1
SELECT & FROM Clauses
Table and Column Aliases
Types of Elements
Concatenating
ORDER BY Clause
WHERE Clause – SQL’s Filter
Using Comparison Operators
Using Other Operators
Handling Cells with No Data – aka NULLs
Aggregate Queries
Pulling DISTINCT Records
Aggregate Functions
GROUP BY and HAVING Clauses
Joining 2 or More Tables
INNER vs. OUTER Joins
WHERE Clause Conditions w/ OUTER Joins
One-to-Many Joins
DATE vs. DATETIME columns
The TO_CHAR() Function with Dates
The TRUNC() Function with Dates
The TO_DATE() Function
Using BETWEEN with Dates
Other Date functions
Subqueries
Avoiding 1-to-many joins
The DECODE(), CASE & NVL Functions
Data Type Consistency
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Answer Key
08/04/2011
Key Tables & Virts
08/09/2011
Week 1 – Basic Structure of SQL
SQL Topics:
 SELECT & FROM
 Table Aliases
 Column Aliases with and without Spaces
1



Types of Column Content
Concatenating
ORDER BY
SQL TOPICS
SQL is a language used (for our purposes) to ask a Database for specific information in a specific format. We generally use it to pull
sets of data, known as Result Sets, which we use to create reports and answer business questions, such as “What was the List Price
value of all goods shipped to customers in 2008?” or “When did we receive the first unit of Twilight Book 1 from Hachette?”.
There are two necessary sections, or clauses, to an SQL query: SELECT and FROM
SELECT & FROM
SELECT tells the database a list of one or more Elements that you want to include in your results
FROM tells the database a list of one or more tables (or views) from which you want to pull the information
A basic query with just these necessary clauses might be:
SELECT WAREHOUSE_ID
, NAME
FROM D_WAREHOUSES;
This query would pull two elements – in this case columns
WAREHOUSE_ID and NAME - from the table named
D_WAREHOUSES, producing a Result Set like this one. Note
that only part of the result set is shown:
WAREHOUSE_ID
IMJO
TUL1
GCWP
NRT3
A00L
MSC7
ECEL
TAJ9
NAME
Ingram Micro, Jonestown, PA
Coffeyville
Granite City Tools – MN
Ichikawa
SED International-Dallas
Bemrose Booth
Amazon Wireless
Target: Light Source
WAREHOUSE_ID and NAME are the column names that we wanted to pull, so we listed them as elements in the SELECT clause.
Notice that the elements in the SELECT clause are separated by commas in the SQL query. The query also ends with a semicolon,
which lets Oracle know that’s the end of the query. ETL Manager doesn’t require this, but it’s good practice to include it.
You may also notice that I put the comma at the beginning of a new line followed by the next element, which is a little different than
might seem intuitive. This is because the comma is only there because there is a second element. If I wanted to delete the NAME
column, I’d also need to delete the comma; otherwise I’d get an error. I find that by putting it at the start of the new line where the
new element is, I can easily delete that row when editing and won’t miss it and cause an error.
SELECT WAREHOUSE_ID
FROM D_WAREHOUSES;
Table Aliases
For reasons that will become apparent when you start joining multiple tables together, it’s best to use Table Aliases when writing
your SQL. A Table Alias is a shorthand name, like nickname, that tells the query from which table each column referenced comes.
To alias a table, you simply add your nickname after the table name, with a space separating them (fcs in the example below). You
also put that alias at the start of each column name, separated from the column name by a period, like this:
SELECT WAREHOUSE_ID
, NAME
FROM D_WAREHOUSES;
BECOMES:
SELECT fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs;
The table alias ensures that the system knows exactly where each column is derived. For example, in a more complex query you
might have two tables, each with a WAREHOUSE_ID column, and Oracle needs to know to which table the column is associated.
2
Column Aliases
Another type of alias is the Column Alias. This is a way to change the name of a column to something that’s more meaningful to you
or your customers, and is what shows up as the Column Headers in your result set. A few examples of Column aliases are below:
SELECT fcs.WAREHOUSE_ID FC_Code
, fcs.REGION_ID AS Region
, fcs.NAME AS "FC Name"
FROM D_WAREHOUSES fcs;
There are two ways you can change the name of a column. The first is to simply put a space between the column name, and your
alias for it after, as we’ve done above to alias the first column, WAREHOUSE_ID, to FC_Code. You can also put the word AS in
between the column name and your alias, as we’ve done to alias the second column, REGION_ID, to Region. Including AS isn’t
necessary, but arguably makes it more clear that you’ve aliased the column. If you want to alias a column to something that has a
space in it, like we’ve done to alias the column NAME to FC Name, you have to enclose it in double quotes, so the system knows
where the alias starts and ends. I usually avoid including spaces in column aliases, because it can lead to problems in more complex
queries, and the standard is to use underscores, as we did with FC_Code. Also, be sure not to start your Column Aliases with a
number, or make them a SQL keyword (like DATE, CUBE or FROM), as this will cause confusing errors.
Types of Elements
A SELECT clause can include elements beyond just columns from tables. There are a number of different elements that can be
included, depending on your needs, including:




Literal values, such as numbers (13) or text strings (‘Howdy!’), that return exactly what you enter
Expressions (aka formulas), such as doi.QUANTITY_SHIPPED + 5, which do math or other logical procedures
Function calls, such as TO_CHAR(ddo.ORDER_DAY,’MM/DD/YYYY’) that transform column information
Pseudocolumns, such as ROWID, ROWNUM, or LEVEL
(Pseudocolumns aren’t columns that actually exist in any table, but are columns you can include in any query for specific uses.)
An example of a query with each of these types of columns is:
SELECT fcs.WAREHOUSE_ID
, 13
, 'Howdy!'
, fcs.REGION_ID + 5
, SUBSTR(fcs.NAME,0,5)
, ROWNUM
FROM D_WAREHOUSES fcs;
Which yields this a result set that includes
these rows:
WAREHOUSE_ID
TGC2
AAOP
SAIN
JACK
WCSC
ABE1
IBEW
LEX1
SEA1
13
13
13
13
13
13
13
13
13
13
'HOWDY!'
Howdy!
Howdy!
Howdy!
Howdy!
Howdy!
Howdy!
Howdy!
Howdy!
Howdy!
FCS.REGION_ID+5
6
6
6
6
6
6
6
6
6
SUBSTR(FCS.NAME,0,5)
Alen
Trend
Saint
Jacks
West
Allen
Ingra
Lexin
Seatt
ROWNUM
1
5
7
8
9
757
969
970
971
You’ll notice that the 13 wasn’t enclosed in single quotation marks, like Howdy!. This is because the system understands the 13 is a
number. Text strings, like ‘Howdy!’ or ‘PHL1’ or ‘I love SQL’ need to be enclosed in single-quotes whenever you use them. Note that
the quotes don’t show up in your results. Also – the single quote in MS Word, Excel and Outlook is a different character, so it’s best
not to edit SQL in these programs (stick to Notepad, Notepad++, or the ETL Manager Profile SQL window). Notepad++ is a favorite of
many coders, as you can adjust the ‘Language’ to SQL and get helpful formatting, as well as indent entire sections with tab. It’s
available for download in Advertised Programs as “Open Source Notepad++”.
3
Concatenating
You can also concatenate information together in your select clause, including columns, numbers, text strings, etc. Unlike Excel,
where you concatenate using the ampersand (&) symbol, SQL uses two pipes ||. To get two pipes, hold down shift and hit the key
just above the Enter key on your keyboard twice. An example of concatenating is in the query below:
SELECT fcs.WAREHOUSE_ID
, 'Howdy! from ' || fcs.NAME
FROM D_WAREHOUSES fcs;
Which yields a result set that includes:
WAREHOUSE_ID
SBTK
DGJP
ABGM
ABGL
LHR2
'HOWDY!FROM'||FCS.NAME
Howdy! from Softbank BB
Howdy! from Digital Goods JP
Howdy! from Step2 UK Limited
Howdy! from Universal Cycles
Howdy! from Plot 8 - Marston Gate
Notice that you need to include your space after 'Howdy! from inside the quotations in order to get it in the results, otherwise,
they’d look like:
WAREHOUSE_ID
SBTK
DGJP
ABGM
ABGL
LHR2
HOWDY!FROM'||FCS.NAME
Howdy! fromSoftbank BB
Howdy! fromDigital Goods JP
Howdy! fromStep2 UK Limited
Howdy! fromUniversal Cycles
Howdy! fromPlot 8 - Marston Gate
ORDER BY
Sometimes, the order of the results is important to answering your question or to displaying the results in the most meaningful way,
like ranking the highest units at the top, or alphabetizing a list of vendor codes. To order your results, you add an ORDER BY clause
to the end of the query, and then specify in that clause which element(s) to order your results by, and even which direction. For
example, you might want to see a list of warehouses names in alphabetical order:
SELECT fcs.NAME
FROM D_WAREHOUSES fcs
ORDER BY fcs.NAME;
NAME
2 Red Hens
3 GIRLS DESIGN/KITTY A GO
32 North Corp
A & W Products Co In
A C R Logistics
A Plus Marketing
A'HOMESTEAD SHOPPE INC
A-America Inc
A.S. Diamonds
AAB Gourmet - Garden City
ABM Corp - Mira Loma
Beijing
Notice that numbers, spaces & symbols are ordered before letters, so 2 Red Hens comes before A.S. Diamonds, which comes before
AAB Gourmet. The default ordering is Ascending (0-9,A-Z).
4
You may also want to sort a list in the other direction. A common use case is when you want the highest number of something at
the top of a list, like the highest number of glance views in a list of ASINs. To do that, you add a space and the word DESC after the
element in your ORDER BY clause, to specify descending order:
SELECT fcs.NAME
FROM D_WAREHOUSES fcs
ORDER BY fcs.NAME DESC;
NAME
Beijing
A'HOMESTEAD SHOPPE INC
ABM Corp - Mira Loma
A-America Inc
AAB Gourmet - Garden City
A.S. Diamonds
A Plus Marketing
A C R Logistics
A & W Products Co In
32 North Corp
3 GIRLS DESIGN/KITTY A GO
2 Red Hens
You may also want to order your results by multiple elements, in which case you include them all in your ORDER BY clause, in order
of importance, separated by commas:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
FROM D_WAREHOUSES fcs
ORDER BY fcs.REGION_ID DESC
, fcs.WAREHOUSE_ID;
REGION_ID
3
3
3
3
3
3
2
2
2
1
1
1
WAREHOUSE_ID
AARF
AARG
CAN1
DEKN
DGJP
NRT1
GLA1
LEJ1
LHR1
KTKN
MYTK
RNO1
Each element indicated in your ORDER BY clause can be sorting in a different direction. In the example above, we ordered the
REGION_ID column descending, and ordered the WAREHOUSE_ID column ascending (which is the default). You can also use any
type of element in your ORDER_BY clause, just like in your SELECT clause, including function calls and expressions.
5
Week 1 Homework:
1.
Read Chapter One in Mastering Oracle SQL.
2.
Create a query that pulls an alphabetized list of Warehouse IDs from the table D_WAREHOUSES, changing the name of the
Warehouse ID column to ‘FC’.
3.
Edit the query to add the column REGION_ID, add an element called ‘CALC’ that multiplies the Region ID by 10, add an
element called ‘FACTOR’ that is populated with the number 10 for all records, and add an element called FC_REGION that
concatenates the WAREHOUSE_ID and the REGION_ID columns with an underscore in between (e.g. PHL1_1).
Here is a description of the D_WAREHOUSES table. We’ll talk more about exploring tables & columns in the future, and what all this
information means, but for now, all you need to know is that the list of column names, so you can play around a bit with querying
this table using ETL Manager.
Table Name: D_WAREHOUSES
Column Name
CAN_SHIP_INTERNALLY
DB_NAME
DW_CREATION_DATE
DW_LAST_UPDATED
HAS_AMAZON_INVENTORY
IP_ADDRESS_LIST_ID
IS_DELAYED_ALLOCATION
IS_DROPSHIP
IS_RETURNS_ONLY
NAME
REGION_ID
Data Type
CHAR
VARCHAR2
DATE
DATE
CHAR
NUMBER
CHAR
CHAR
CHAR
VARCHAR2
NUMBER
WAREHOUSE_ID
CHAR
Data Length
1
8
7
7
1
22
1
1
1
50
22
4
Data Precision
38
Nullable?
N
Y
N
N
N
Y
N
N
N
N
N
N
Num
Distinct
2
57
19
1
2
51
2
2
2
3340
3
3453
Remember: One of the great things about SQL is that there are usually several ways to get to the same answer. Different people’s
minds think about and solve problems in different ways, and you’ll likely find some methods that work for you that may be different
than what your peers are doing. A good SQL coder is a creative SQL coder, so don’t be afraid to try something ‘off-book’.
6
Week 2 – Building Queries to Pull Just the Results You Want
SQL Topics
 The WHERE Clause – SQL’s Filter
 Using Comparison Operators
 Using Other Operators
 Handling Cells with No Data – aka NULLs
SQL TOPICS
The WHERE Clause – SQL’s Filter
Although the SELECT and FROM clauses are the only required sections of a SQL query, they only allow you to pull every record from
a table – not just the ones you want. Imagine querying the D_CUSTOMER_SHIPMENT_ITEMS table, which BI Metadata shows has
over 5 billion rows of data. The output would be too large for Excel, and you’d have a lot of information you don’t really want.
That’s where the WHERE clause comes in.
I think of WHERE as the filters I put on the table, to filter out what I don’t want, and only let what I do want get through to my result
set. Each ‘filter’ in the where clause is a Condition that must be true in order to be returned by the query.
The WHERE clause goes after the FROM clause, but before the ORDER_BY clause (if you’re using one). The WHERE clause starts
with the word WHERE, and then is followed by one or more filters, called conditions. For example, if we wanted to pull an
alphabetical list of FCs in Japan & China (REGION_ID 3), we could run the following:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
ORDER BY fcs.WAREHOUSE_ID;
This would return the following dataset, limited to only WHERE
the REGION_ID is equal to 3:
REGION_ID
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
WAREHOUSE_ID
AARF
AARG
CAN1
DEKN
DGJP
FFSA
FMTT
FUOS
KCFK
KTKN
MYTK
NRT1
NRT2
NRT3
OSKF
OTOS
PEK3
SBTK
SHA1
YYGF
NAME
¿¿¿¿
Amazon¿¿
Guangzhou
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
Digital Goods JP
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿¿¿¿¿
Kenko.com, INC.
¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿¿¿¿¿¿Narita
Yachiyo-shi
Ichikawa
Osakaya Books
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
Beijing
Softbank BB
Shanghai
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
Of course, you’re not limited to just one condition. You may want to filter you results by several criteria, and so would have multiple
conditions in the WHERE clause. For example, we might limit the above query further to only those FCs with WAREHOUSE_IDs that
start with the letter Y. (Don’t worry about what LIKE ‘Y%’ means exactly right now – we’ll get to that shortly. Just know it means
‘starts with Y’):
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
Yeilding the following result set:
7
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND fcs.WAREHOUSE_ID LIKE 'Y%'
ORDER BY fcs.WAREHOUSE_ID;
REGION_ID
3
WAREHOUSE_ID
YYGF
NAME
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
Notice that we separated the two conditions in the WHERE clause with AND. This means that BOTH the first condition
(fcs.REGION_ID = 3) AND the second condition (fcs.WAREHOUSE_ID LIKE ‘Y%’) must be true.
8
You can also separate multiple conditions with OR, in which case either condition must be true. If we change the AND in our query
above to an OR, the results are much different:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
OR fcs.WAREHOUSE_ID LIKE 'Y%'
ORDER BY fcs.WAREHOUSE_ID;
In this case, we pulled all FCs where the REGION_ID is equal to 3
OR where the WAREHOUSE_ID begins with Y, so we got all the
FCs in Region 3 regardless of what letter they start with, plus all
the FCs in other regions that start with Y – which happens to
only include YAHA.
REGION_ID
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
1
3
WAREHOUSE_ID
AARF
AARG
CAN1
DEKN
DGJP
FFSA
FMTT
FUOS
KCFK
KTKN
MYTK
NRT1
NRT2
NRT3
OSKF
OTOS
PEK3
SBTK
SHA1
YAHA
YYGF
NAME
¿¿¿¿
Amazon¿¿
Guangzhou
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
Digital Goods JP
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿¿¿¿¿
Kenko.com, INC.
¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿¿¿¿¿¿Narita
Yachiyo-shi
Ichikawa
Osakaya Books
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
Beijing
Softbank BB
Shanghai
Yamazaki Tableware -- Hackettstown
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
And you can even get more complex by using parentheses to change how the AND and OR logic is applied. For example, maybe you
want a list of all FCs where the REGION_ID is 3 and either the WAREHOUSE_ID starts with Y or it starts with D. We could do that
using parentheses:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND (fcs.WAREHOUSE_ID LIKE 'Y%'
OR
fcs.WAREHOUSE_ID LIKE 'D%')
ORDER BY fcs.WAREHOUSE_ID;
REGION_ID
3
3
3
WAREHOUSE_ID
DEKN
DGJP
YYGF
NAME
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
Digital Goods JP
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
The results include only FCs where REGION_ID = 3 AND where
either the WAREHOUSE_ID starts with Y OR where the
WAREHOUSE_ID starts with D. Pages 20-22 of your textbook has
additional examples and some charts on how various logical
combinations of AND and OR, with and without parentheses are
evaluated.
Using Comparison Operators
In the examples above, we used two different ‘comparison operators’ in our WHERE clauses to limit our results: the equals symbol
(=) and LIKE. There are many more comparison operators available to help us apply conditions in our query.
The equals sign can be used to evaluate if something is equal to something else in a condition, as we did when we put fcs.REGION_ID
= 3 in our WHERE clause in the example above. Equality can also be evaluated for columns that contain text strings (VAR and
VARCHAR data type columns), in which case you must put that text string in a set of single quotation marks. For example:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID = 'PHL1';
REGION_ID
1
WAREHOUSE_ID
PHL1
NAME
New Castle
9
You can also evaluate whether something is NOT equal to something else, using either of two symbols: <> or !=. If we changed the
operator in the query above from = to !=, the query would give you all FCs except for PHL1.
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID != 'PHL1';
If you want records that are greater than or less than a certain value, you can use the > and < symbols, as you do the =. You can also
evaluate if something is greater than or equal to using >=, and evaluate if something is less than or equal to using <=. And just like =
and !=, they can be used on text strings, too.
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID >= 'WYTN';
REGION_ID
1
1
WAREHOUSE_ID
YAHA
WYTN
NAME
Yamazaki Tableware -- Hackettstown
WYNIT, Inc.
The operator IN can also be used in a WHERE clause, when you have a list of things that you want to check for. To pull data from
D_WAREHOUSES where the FC was either PHL1 or RNO1, we could do it with two conditions, this way:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID = 'PHL1'
OR fcs.WAREHOUSE_ID = 'RNO1';
REGION_ID
1
1
WAREHOUSE_ID
PHL1
RNO1
NAME
New Castle
Fernley
WAREHOUSE_ID
PHL1
RNO1
NAME
New Castle
Fernley
Or we could get the same results with a single condition by using the IN operator:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID IN ('PHL1','RNO1');
REGION_ID
1
1
When you use the IN operator, you follow it with a list of values inside a set of parentheses, separated by commas. The condition
could be read as WHERE the WAREHOUSE_ID matches any of the values in the list ‘PHL1’,’RNO1’, so it returns information for any
record where WAREHOUSE_ID matches any value in the list. With just two values, as in the example, either the OR or IN method
takes about the same amount of time to write – but when you have many more values to evaluate, such as a list of 50 vendor codes,
IN becomes much quicker. (Note that the upper limit on values in the list of an IN condition is reportedly 1000.)
You can also query for records that are NOT IN a list, by putting NOT in front of IN. If we changed the condition in the last example
from IN to NOT IN, we’d get every FC except PHL1 and RNO1.
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID NOT IN ('PHL1','RNO1');
10
Another great shortcut operator is BETWEEN. To query for warehouses PHL1 and PHL2, we can query:
This way:
Or this way:
Or even this way:
SELECT fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE
(fcs.WAREHOUSE_ID='PHL1'
OR fcs.WAREHOUSE_ID='PHL2');
SELECT fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID IN
('PHL1','PHL2');
SELECT fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID >= 'PHL1'
AND fcs.WAREHOUSE_ID <= 'PHL2';
But imagine that you had a very long range, such as a date range of many weeks, and only wanted to pull a portion of them. You
wouldn’t want to have to list all of them. The quicker way to do this type of query would be to use the BETWEEN operator.
To use the BETWEEN operator, you follow the column you’re evaluating by the word BETWEEN, then the first value in the range,
followed by AND, and finally the last value in the range. It’s important to remember that the BETWEEN operator is Inclusive,
meaning your results will include anything between the numbers AND anything that matches the numbers.
SELECT fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID BETWEEN
'PHL1' AND 'PHL3';
WAREHOUSE_ID
NAME
PHL1
New Castle
PHL2
Chambersburg
PHL3
Centerpoint
By using BETWEEN, we can return
PHL1, PHL3 (the ends of the range)
and PHL2 – which falls between
them alphabetically.
Yet another comparison operator that we used earlier is LIKE. The LIKE operator evaluates matching for columns with text strings
(CHAR and VARCHAR columns), and is usually used with a ‘pattern matching character’. The two ‘pattern matching characters’ (aka
wildcards) are % and _. The percent (%) symbol matches to a string of characters of any length, whereas the underscore (_) symbol
matches to any one character. Now our previous example of FCs starting with the letter Y should make more sense:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND fcs.WAREHOUSE_ID LIKE 'Y%';
REGION_ID
3
WAREHOUSE_ID
YYGF
NAME
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
So we are looking for any FC where REGION_ID = 3 and WAREHOUSE_ID is like a text string that starts with a letter Y, and is followed
by any number of characters. If we wanted to be more specific, we could query for any FC that starts with PH and ends with 1 using
the single character wildcard:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID LIKE 'PH_1';
REGION_ID
1
1
1
WAREHOUSE_ID
PHL1
PHX1
PH01
NAME
New Castle
Phoenix
LaserShip Philly
So we get back the three FCs that start with PH, end with 1, and have a single character in between: PHL1, PHX1 and PH01.
The LIKE operator can also be negated, like IN, with the addition of NOT:
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.WAREHOUSE_ID NOT LIKE 'PH_1';
11
Using Other Operators in the WHERE clause
Just like in the SELECT clause, you can use mathematical operators like +, -, and * in the WHERE clause to evaluate conditions. The
following query would return all FCs in Region 2, given that 2+1 = 3. An odd example, but I promise this is useful when you begin
using dates.
SELECT fcs.REGION_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID + 1 = 3;
Handling Cells with No Data – aka NULLs
A NULL is a blank cell. A void. Nothing. Nada. Ziltch. But not Zero. Zero is something, which represents nothing. Confused yet?
REGION_ID
1
1
0
WAREHOUSE_ID
PHL4
PHL5
PHL1
NAME
Carlisle
New Castle
In the imaginary table above, the third record has a REGION_ID that is 0. But that second record has a NAME value that is NULL. It’s
empty. Since something can never be equal to nothing, you can’t use many of the usual conditional operators to evaluate whether a
record in a column is NULL. So SQL has special operators for NULLs, and some special functions for dealing with them, too.
If we wanted to look for any records in D_WAREHOUSES where the IP Address is NULL, we’d use the IS NULL operator:
SELECT fcs.REGION_ID
, fcs.IP_ADDRESS_LIST_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND fcs.IP_ADDRESS_LIST_ID IS NULL;
REGION_ID
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
IP_ADDRESS_LIST_ID
WAREHOUSE_ID
NRT1
AARG
AARF
OSKF
KCFK
SBTK
DGJP
OTOS
YYGF
MYTK
FFSA
FMTT
DEKN
FUOS
KTKN
NAME
Narita
Amazon¿¿
¿¿¿¿
Osakaya Books
Kenko.com, INC.
Softbank BB
Digital Goods JP
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿¿¿¿¿¿¿
¿¿¿¿¿¿¿
And just like IN and LIKE, you can negate IS NULL by sticking in a NOT:
SELECT fcs.REGION_ID
, fcs.IP_ADDRESS_LIST_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND fcs.IP_ADDRESS_LIST_ID IS NOT NULL;
The IS NOT query will return the opposite results – all FCs where the IP Address field isn’t blank.
12
Since some columns have nulls (and we can tell which by the Nullable field in BI Metadata), and since <> or != operators will exclude
NULLs, you have to be careful sometimes if you want all records where a field is EITHER NULL or is not equal to a specified value.
You could write two conditions in your WHERE clause to evaluate the same column, like this:
SELECT fcs.REGION_ID
, fcs.IP_ADDRESS_LIST_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND (fcs.IP_ADDRESS_LIST_ID IS NULL
OR fcs.IP_ADDRESS_LIST_ID != 1035);
But thankfully, SQL has a handy function called NVL, which translates any NULL values to another value that you specify, so you can
use standard comparison operators to evaluate the column in a single condition, without much extra work.
SELECT fcs.REGION_ID
, fcs.IP_ADDRESS_LIST_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND NVL(fcs.IP_ADDRESS_LIST_ID,0) != 1035;
The format of the NVL function is NVL, followed by a
parenthesis, inside of which are your column name, a comma,
and then what you want nulls to be translated to. In the
example above, we translated any nulls in the column
IP_ADDRESS_LIST_ID to the number 0. Then, we evaluate the
results for whether they are not equal to 1035. Since the nulls
are converted to 0, they are not equal to 1035, and will appear
in the results.
REGION_ID
3
3
3
3
3
3
3
3
3
3
3
IP_ADDRESS_LIST_ID
REGION_ID
3
3
3
3
IP_ADDRESS_LIST_ID
1039
1041
1040
25
1040
25
1039
1041
WAREHOUSE_ID
AARF
AARG
CAN1
DEKN
NRT3
OSKF
OTOS
PEK3
SBTK
SHA1
YYGF
NAME
¿¿¿¿
Amazon¿¿
Guangzhou
¿¿¿¿¿¿¿¿¿¿¿¿¿¿
Ichikawa
Osakaya Books
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
Beijing
Softbank BB
Shanghai
¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿
WAREHOUSE_ID
PEK3
SHA1
CAN1
NRT3
NAME
Beijing
Shanghai
Guangzhou
Ichikawa
Had we left out the NVL function, the results would be very different:
SELECT fcs.REGION_ID
, fcs.IP_ADDRESS_LIST_ID
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND fcs.IP_ADDRESS_LIST_ID != 1035;
And NVL can also be used in the SELECT clause, to replace NULLs with something more meaningful to the audience of the data. For
example, we might change any NULLS in the IP Address column to a zero, like this:
SELECT fcs.REGION_ID
, NVL(fcs.IP_ADDRESS_LIST_ID,0)
, fcs.WAREHOUSE_ID
, fcs.NAME
FROM D_WAREHOUSES fcs
WHERE fcs.REGION_ID = 3
AND fcs.IP_ADDRESS_LIST_ID IS NULL
AND fcs.WAREHOUSE_ID LIKE '___K';
REGION_ID
3
3
3
NVL(FCS.IP_ADDRESS_LIST_ID,0)
0
0
0
WAREHOUSE_ID
SBTK
KCFK
MYTK
NAME
Softbank BB
Kenko.com, INC.
¿¿¿¿¿¿¿¿¿¿¿¿¿¿-
13
Week 2 Homework:
1. Read Chapter Two in Mastering Oracle SQL
2.
Make sure you’re signed up to the etl-users@amazon.com mailing list (and have a Outlook rule in place for those emails).
3.
Create a query that pulls a list of warehouses that are in North America (Region 1) and have Amazon inventory from the
D_WAREHOUSES table. Be sure to run an Explain Plan on the query before running it.
4.
Edit the query to add the FC Name as an element called ‘FC Name’, and include only FCs with the word ‘Logistics’ in their
name. Remember to run an Explain Plan first.
5.
Check out the table PRODUCT_GROUPS in BI Metadata. How many rows does it have? How many columns? Which
columns might have NULLs in them? What type of information is in the PRODUCT_GROUP column? What type of table is
it?
6.
Create a query that pulls a list of GL Product Group Codes, in numerical order, from the PRODUCT_GROUPS table. Include
the column SHORT_DESC in your results, and replace any null values in that column with the word ‘Unknown’. (Although
there isn’t a column explicitly named GL_PRODUCT_GROUP in the table, one of the columns contains this information. Use
BI Metadata and look at the Data Types of the columns, and make an educated guess about which column to pull.
7.
Edit the query to include the DESCRIPTION column, to only return results with a GL Product Group value of at least 14, and
only return results with a DESCRIPTION in the following list: Books, Universal, Shops, Advertising, or Art.
8.
If you haven’t studied logic, or are having difficulty wrapping your head around the difference between the results you’d
get from WHERE A AND B OR C and WHERE A AND (B OR C), do a little Googling on logic. Concepts like Modus Ponens and
Modus Tollens will aid you greatly in writing and understanding SQL.
9.
Run Explain plans on the following queries, but DO NOT RUN THEM. These are good examples of bad queries:
SELECT ddo.order_id
FROM d_distributor_orders ddo
, d_warehouses fcs
WHERE ddo.warehouse_id =
fcs.warehouse_id;
SELECT ddo.order_id
FROM d_distributor_orders ddo
, d_distributor_order_items doi;
14
Week 3 – Aggregate Queries and HAVING Clause
SQL Topics
 Aggregate Queries
 Pulling DISTINCT Records
 Aggregate Functions
 Aggregate Functions with DISTINCT
 The GROUP BY Clause
 The HAVING Clause
SQL TOPICS
Aggregate Queries
So far, we’ve created queries that pull all rows of data from a table using SELECT and FROM and used the WHERE clause to limit
which rows we pull. Now we’re going to aggregate (group together) multiple rows of data into a single row in the result set, using
the DISTINCT keyword, the GROUP BY clause, and some aggregate operators.
DISTINCT
The simplest form of aggregate query is one where you simply want to know all the unique values in a certain column of a table. For
example, you might want a list of all the possible values for the REGION_ID column in D_WAREHOUSES, so you know how to limit
your query properly. There are over 3000 rows of data in D_WAREHOUSES, but you can use DISTINCT to pull only the unique values
for the REGION_ID column.
To do that, write your query as you would to pull all the records for that column, but put the word DISTINCT after the SELECT but
before the column, like this:
SELECT /*+ use_hash(fcs) */
DISTINCT
fcs.REGION_ID
FROM D_WAREHOUSES fcs;
REGION_ID
1
2
3
The query returns 3 rows of data, one for each DISTINCT value in the REGION_ID column. Even though each value is in the table
many times in many records, the addition of the DISTINCT keyword limits the results to only the unique values.
If your SELECT clause has multiple elements, DISTINCT will return all the unique combinations of elements. Now that you know that
the values in REGION_ID are 1,2, and 3, you might want to know whether each Region has Delayed Allocation warehouses or not. To
do this, you again put DISTINCT before your first column:
SELECT /*+ use_hash(fcs) */
DISTINCT
fcs.REGION_ID
, fcs.IS_DELAYED_ALLOCATION
FROM D_WAREHOUSES fcs;
REGION_ID
1
1
2
3
IS_DELAYED_ALLOCATION
Y
N
N
N
Now results tell us that Region 1 has some warehouses that are Delayed Allocations (Y) and some that are not (N), but the other two
regions only have warehouses that are not Delayed Allocation nodes. There are two rows for REGION_ID 1, because the value in the
IS_DELAYED_ALLOCATION for each is distinct, and DISTINCT finds all unique combinations of all the elements in the SELECT clause.
(Notice that you only include the DISTINCT keyword once, after the first element, even when there are multiple element.)
Other examples of using DISTINCT would be to find out what all the unique ORDER_TYPE values are in D_DISTRIBUTOR_ORDERS, or
to find a list of all ASINs we’ve ordered from a specific vendor in the past 6 months, and which of those were ever backordered.
15
Aggregate Functions
The DISTINCT function can help you get lists of unique values, and even answer some business questions, but you’ll also find you
want to count the number of POs placed, or sum the total quantity we ordered on a PO, or find the first date that we received
something from a vendor. All of these require the use of aggregate functions.
The main aggregate functions are
 COUNT – which counts how many values there are in a column
 MAX – which finds the maximum value in a column
 MIN – which finds the minimum value in a column
 SUM – which adds together the values in a column
 AVG – which averages the values in a column
All of these functions are used in the SELECT clause. The format is to start with the function, and then put the column you want to
aggregate in parentheses after it, like SUM(doi.QUANTITY) or COUNT(fcs.REGION_ID). Make sure to put the table alias inside the
function along with the column name. If we wanted to COUNT how many records there are in the D_WAREHOUSES table, we could
write the following:
SELECT /*+ use_hash(fcs) */
COUNT(fcs.NAME)
FROM D_WAREHOUSES fcs;
COUNT(FCS.WAREHOUSE_ID)
4960
Using COUNT to count the number of records in the NAME column, we know that there are 3513 records in the D_WAREHOUSES
table, without having to pull all the records and count them ourselves.
If we wanted to know when the first and last dates that a record was entered into the D_WAREHOUSE table, we could use the MIN
and MAX functions:
SELECT /*+ use_hash(fcs) */
MIN(fcs.DW_CREATION_DATE)
, MAX(fcs.DW_CREATION_DATE)
FROM D_WAREHOUSES fcs;
MIN(FCS.DW_CREATION_DATE)
1/20/2009 17:45
MAX(FCS.DW_CREATION_DATE)
8/8/2011 7:07
From the results, we learn that the first record was created by DataWarehouse on 1/20/2009, and the last record was created on
3/20/2009. Notice that the same column name (DW_CREATION_DATE) was evaluated in both fields of the SELECT clause, but in the
first field we ran the MIN function on that column, and in the second field we ran the MAX function on that column.
The SUM function allows you to add up everything in a column, and get a total. One example might be if you want to know how
many units we submitted on a specific PO. We can find that out using the SUM function:
SELECT /*+ use_hash(doi) */
SUM(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY =
to_date('20090319','YYYYMMDD')
AND doi.ORDER_ID = 'C4811075';
SUM(DOI.QUANTITY_SUBMITTED)
37
If we want to know the average number of units submitted on that PO, we could exchange out the SUM function for AVG:
SELECT /*+ use_hash(doi) */
AVG(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY =
to_date('20090319','YYYYMMDD')
AND doi.ORDER_ID = 'C4811075';
AVG(DOI.QUANTITY_SUBMITTED)
7.4
This shows us that for the ASINs we ordered on PO C4811075, the average number of units ordered was 7.4.
16
Putting all these functions together, we could learn a lot about the PO in one query:
SELECT /*+ use_hash(doi) */
COUNT(doi.ISBN)
, MIN(doi.QUANTITY_SUBMITTED)
, MAX(doi.QUANTITY_SUBMITTED)
, SUM(doi.QUANTITY_SUBMITTED)
, AVG(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090319','YYYYMMDD')
AND doi.ORDER_ID = 'C4811075';
COUNT(DOI.ISBN
5
MIN(DOI.QUANTITY_SUBMITTED
1
MAX(DOI.QUANTITY_SUBMITTED
32
SUM(DOI.QUANTITY_SUBMITTED
37
AVG(DOI.QUANTITY_SUBMITTED
7.4
We learn that we ordered 5 ASINs on this PO, that the minimum units ordered was 1 but the maximum was 32, that we ordered 37
units total, and the average was 7.4.
Aggregate Functions with DISTINCT
Sometimes, particularly with the COUNT function, you’ll want to find out how many unique records are in a table, which might be
different than the count of total records. For example, in the table D_WAREHOUSES, we learned earlier that there are 4960
records, by using COUNT to count the NAME column.
SELECT /*+ use_hash(fcs) */
COUNT(fcs.NAME)
FROM D_WAREHOUSES fcs;
COUNT(FCS.WAREHOUSE_ID)
4960
However, some of those names might repeat, so there might not be 4960 unique NAME values in the table. To find that out, we
combine DISTINCT with our aggregate function, but this time putting it inside the function, before the column name. In this
example, we put DISTINCT inside the COUNT function, to COUNT the DISTINCT values in the fcs.NAME column:
SELECT /*+ use_hash(fcs) */
COUNT(DISTINCT fcs.NAME)
FROM D_WAREHOUSES fcs;
COUNT(DISTINCTFCS.NAME)
4790
By adding DISTINCT to our COUNT function, we find that although there are 4960 values in the NAME column, there are only 4790
DISTINCT values in that column, so some must repeat.
GROUP BY
So far, we’ve aggregated information for a whole table (in the case of D_WAREHOUSES) and for a set of records limited by the
WHERE clause (as in our queries D_DISTRIBUTOR_ORDER_ITEMS to learn about PO C4811075). Now we’ll talk about how to use
those same aggregate functions to group sets of records together for each unique value in certain columns, while aggregating other
columns. For example, we might want to know how many units we ordered on each PO we placed with Wiley on a given day.
We could query how many units we ordered from Wiley on 3/16/2009, like this:
SELECT /*+ use_hash(doi) */
SUM(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY =
to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY';
SUM(DOI.QUANTITY_SUBMITTED)
57218
But that doesn’t tell us for each PO. Rather than run the query once for each PO, we can add ORDER_ID to the SELECT clause and
add the GROUP BY clause with ORDER_ID, so the query SUMs up the number of units ordered for each PO. The GROUP BY clause
comes after the WHERE clause (but before the ORDER_BY clause, if you’re using one), and indicates which columns from your
17
SELECT clause you want to group the results by. To group our query above by PO, we’d add it to the SELECT clause and to the
GROUP BY clause, like this:
SELECT /*+ use_hash(doi) */
doi.ORDER_ID
, SUM(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY =
to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY'
GROUP BY doi.ORDER_ID;
ORDER_ID
M7444521
P4010601
R7453213
U1897503
Q0625613
SUM(DOI.QUANTITY_SUBMITTED)
1
3
57203
5
6
Now we know how much we ordered on each of the 5 POs we placed with Wiley on that day.
We can add additional columns to our SELECT clause to get more information. If they’re an aggregate column, such as a COUNT or
AVG function, then we don’t need to put them in the GROUP BY clause. But if they aren’t an aggregate column, we’ll need to also
add them to the GROUP BY clause. For example, we could add a COUNT of DISTINCT ASINs on each PO, as well as add the STATUS
column - which is an ASIN level attribute in the table that indicates whether that ASIN was Backordered (BO) or not on that PO. For
each PO, there may be some ASINs that are backordered, and some that aren’t.
SELECT /*+ use_hash(doi) */
doi.ORDER_ID
, doi.STATUS
, SUM(doi.QUANTITY_SUBMITTED)
, COUNT(DISTINCT doi.ISBN)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY = to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY'
GROUP BY doi.ORDER_ID
, doi.STATUS;
ORDER_ID
M7444521
P4010601
Q0625613
Q0625613
R7453213
R7453213
U1897503
STATUS
BO
BO
SUM(DOI.QUANTITY_SUBMITTED)
1
3
2
4
1296
55907
5
COUNT(DISTINCTDOI.ISBN)
1
1
2
3
416
5930
1
We don’t need to put COUNT(DISTINCT doi.ISBN) in the GROUP BY clause, because that column includes an aggregate function. But
we do need to put doi.STATUS in the GROUP BY clause, because it doesn’t include an aggregate function. You’ll notice that since we
grouped by two columns, we got some additional rows of data. That’s because POs Q0625613 and R7453213 both had some ASINs
that were backordered, and some that were not. Our SUM and COUNT data is now grouped by both ORDER_ID and STATUS.
For people familiar with Excel Pivot tables, it can be helpful to think of queries using GROUP BY as something like a Pivot table, with
certain fields being grouped and certain columns being summed, counted, averaged, etc. Each time you add in a new level of
grouping, the columns being aggregated change.
A GROUP BY clause is only needed if you have BOTH Aggregate functions and non-Aggregate elements in your SELECT clause. One
easy way to make sure they’re in synch is to copy all the elements in your SELECT clause and paste them in your GROUP BY clause,
then delete any elements with Aggregate functions (SUM, COUNT, MIN, etc). (You also need to delete any Column Aliases from the
GROUP BY clause.)
18
The HAVING Clause
Once you begin aggregating, you’ll find that you may want to limit your results to only records where the result of an aggregation
meets a certain criteria. For example, we might only want to look at POs were we ordered 1 unit on the entire PO. We can’t do this
in the WHERE clause, because the conditions in the WHERE clause are evaluated before we aggregate.
Going back to our earlier example, where we summed the units submitted on all POs for Wiley on 3/16/2009, if we tried to find all
POs where we only ordered one unit on the entire PO by limiting the WHERE clause, we’d get the wrong results:
SELECT /*+ use_hash(doi) */
doi.ORDER_ID
, SUM(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY =
to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY'
AND doi.QUANTITY_SUBMITTED = 1
GROUP BY doi.ORDER_ID;
ORDER_ID
M7444521
Q0625613
R7453213
SUM(DOI.QUANTITY_SUBMITTED)
1
4
1811
We’re actually looking at all the POs WHERE we only ordered one unit of at least one ASIN on that PO, and then summing the
quantities of those ASINs – which we can see because the SUM of the QUANTITY_SUBMITTED on two POs is greater than one. This
is a totally valid query, but doesn’t answer the question we were asking: Which POs submitted to Wiley on 3/16/09 only had one
unit submitted on the entire PO. To get the answer to that, we use a HAVING clause.
A HAVING clause is put at the end of an aggregate query, after the GROUP BY, to limit the results AFTER the aggregation is done. It’s
a filter, just like the WHERE clause, but the filtering is done after things are summed and counted and averaged.
SELECT /*+ use_hash(doi) */
doi.ORDER_ID
, SUM(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY =
to_date('20090316','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'WILEY'
GROUP BY doi.ORDER_ID
HAVING SUM(doi.QUANTITY_SUBMITTED) = 1;
ORDER_ID
M7444521
SUM(DOI.QUANTITY_SUBMITTED)
1
One way to think about having is to imagine the results of the query if we’d run it without the HAVING clause, then filter those by
the conditions in the HAVING clause. We actually ran this query without the HAVING clause in an earlier example, getting:
ORDER_ID
M7444521
P4010601
R7453213
U1897503
Q0625613
SUM(DOI.QUANTITY_SUBMITTED)
1
3
57203
5
6
So we could expect the result we got – only PO M7444521 had just a single unit ordered on the entire PO. It’s worth noting that
since the HAVING clause adds a second round of filtering to the query, it can add a lot of time to the query, too.
19
Week 3 Homework:
1.
Read Chapter 4 in Mastering Oracle SQL.
2.
Check out the table D_DISTRIBUTOR_ORDERS in the BI Metadata. How is it Partitioned?
3.
Create a query to count how many POs have been created for the vendor code PRBRC in the US, using the table
D_DISTRIBUTOR_ORDERS. Remember to use BI Metadata to determine which columns are Partitioned, and make sure you
include those in your WHERE clause. Run the query through an Explain Plan before running it.
4.
Add elements to find the first ORDER_DAY and the last ORDER_DAY for PRBRC POs
5.
Add a column to count how many distinct ORDER_DAY values there are.
6.
Add a column to sum up the total SHIPPING_COST.
7.
Add DISTRIBUTOR_ID as an element of your SELECT clause. You’ll need to add a GROUP BY clause, since this is not an
aggregated column.
8.
Add HANDLER as an element in the SELECT clause. Who created the most POs for PRBRC? When was the last time danac
created one?
9.
Use the HAVING clause to limit the results to only handlers who created between 30 and 40 PRBRC POs.
10. Further limit the results to only handlers who created PRBRC POs on at least 10 different days.
11. Rerun the query, but this time, have it publish to the folder \\ant\dept\BMVDSA\Books\ETL_Practice\, rather than email
you the results. Have the file name include both the Job Profile and Job Run Wildcards, with the appropriate (.txt)
extension for a tab delimited text file.
20
Week 4 - Joining Tables
SQL Topics
 Joining 2 or More Tables – Old School & New School Approaches
 INNER Joins vs. OUTER Joins
 WHERE Clause Conditions with OUTER Joins
 One-to-Many Joins
SQL TOPICS
Joining 2 or More Tables – Old School & New School Approaches
Getting data out of one table is great, but ETL allows you the flexibility to join multiple tables in the Data Warehouse together and
pull custom data sets that meet your business needs. With the roll out version 9i of Oracle SQL, a new method of joining tables was
introduced, which is what our text, and I, will use. But you’ll surely run into code that uses the old syntax, so I recommend reading
the Appendix on page 449 of Mastering Oracle SQL, so you aren’t left confused when you find commas in the FROM clause and (+) in
the WHERE clause. There are several advantages to the new syntax that you can read about in your text, and I feel it’s easier to
understand than the old syntax.
The ‘New School’ approach to joining tables uses the FROM clause to indicate which tables you want information from AND how
they are joined together. For example, if I wanted to join the VENDORS table (which has lots of great Vendor Master data) to the
O_AMAZON_BUSINESS_GROUPS table to translate the AMAZON_BUSINESS_GROUP_ID number into the description of the business
group that I’m familiar with, I’d do the following:
SELECT /*+ use_hash(v,o_abg) */
v.VENDOR_ID
, v.PRIMARY_VENDOR_CODE
, v.VENDOR_NAME
, v.AMAZON_BUSINESS_GROUP_ID
, o_abg.TYPE
FROM VENDORS v
JOIN O_AMAZON_BUSINESS_GROUPS o_abg
ON v.AMAZON_BUSINESS_GROUP_ID = o_abg.ID
WHERE v.PRIMARY_VENDOR_CODE = 'RANDO';
VENDOR_ID
3453
PRIMARY_VENDOR_CODE
RANDO
VENDOR_NAME
Random House
AMAZON_BUSINESS_GROUP_ID
1
TYPE
US Books
The syntax is to start your FROM clause and enter the name and alias of the first table. Then specify the type of join (in this case a
standard inner JOIN) and the name and alias of the second table. Follow that by the word ON, and then indicate which columns
define the join between your two tables, with an equals sign between them. Above, we joined the VENDORS table to the
O_AMAZON_BUSINESS_GROUPS table
FROM VENDORS v
JOIN O_AMAZON_BUSINESS_GROUPS o_abg
and returned results where the AMAZON_BUSINESS_GROUP_ID in VENDORS is equal to the ID in O_AMAZON_BUSINESS_GROUPS.
ON v.AMAZON_BUSINESS_GROUP_ID = o_abg.ID
21
You can also join ON multiple columns between two tables, by adding them to the ON clause, separated by AND:
SELECT /*+ use_hash(ddo,doi) */
ddo.ORDER_ID
, doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM D_DISTRIBUTOR_ORDERS ddo
JOIN D_DISTRIBUTOR_ORDER_ITEMS doi
ON ddo.ORDER_ID = doi.ORDER_ID
AND ddo.DISTRIBUTOR_ID = doi.DISTRIBUTOR_ID
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'N9161983'
AND doi.REGION_ID = 1
AND doi.ORDER_DAY = TO_DATE('20090312','YYYYMMDD');
ORDER_ID
N9161983
ISBN
0321357973
QUANTITY_SUBMITTED
1
*When you join two tables, and both have Partitioning Schemes, be sure to include conditions in your WHERE clause to ensure
you’re making use of the partitions in both tables.*
You can also join 3 or more tables together, of course, by specifying the JOIN type and JOIN ON condition for each additional table:
SELECT /*+ use_hash(ddo,doi) */
ddo.DISTRIBUTOR_ID
, v.VENDOR_NAME
, ddo.ORDER_ID
, doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM D_DISTRIBUTOR_ORDERS ddo
JOIN D_DISTRIBUTOR_ORDER_ITEMS doi
ON ddo.ORDER_ID = doi.ORDER_ID
AND ddo.DISTRIBUTOR_ID = doi.DISTRIBUTOR_ID
JOIN VENDORS v
ON ddo.DISTRIBUTOR_ID = v.PRIMARY_VENDOR_CODE
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'N9161983'
AND doi.REGION_ID = 1
AND doi.ORDER_DAY = TO_DATE('20090312','YYYYMMDD');
DISTRIBUTOR_ID
PEAED
VENDOR_NAME
Pearson Technology Group
ORDER_ID
N9161983
ISBN
321357973
QUANTITY_SUBMITTED
1
You can see that we joined ddo to doi on two columns, and we joined ddo to v on one column. Each table needs to be joined to at
least one other table to avoid a Cartesian join.
Notice that as you begin joining multiple tables, you can begin including columns from all the tables as elements in your SELECT
clause, and include conditions in your WHERE clause on columns from each of those tables. This is where the need for table aliases
becomes clear – to let Oracle know that you want the DISTRIBUTOR_ID from D_DISTRIBUTOR_ORDERS, not VENDORS.
22
INNER Joins vs. OUTER joins
There are two main types of JOINs used in writing SQL: INNER and OUTER JOINs.
INNER JOINs will likely be what you use most often, and is the default join type (thus you only need to type JOIN to use it). They
return only results where the condition specified in your JOIN ON section is true. In other words, it returns only records where it
finds matching records in both tables. In the example above, the INNER JOIN limits the results to only return records from the table
VENDORS that match to records in the table O_AMAZON_BUSINESS_GROUPS where the join condition
v.AMAZON_BUSINESS_GROUP_ID = o_abg.ID is true. Because INNER JOIN is the default join type, any query where the join type is
simply JOIN is actually an INNER JOIN.
An OUTER JOIN is used when you want to join two tables but you want all the records from one table and any results from the
second table that match. OUTER JOINs can be of two main types, which seem confusing at first, but are really quite simple: LEFT and
RIGHT OUTER JOINs.
One way to think about the differences between INNER and OUTER joins
is with a Venn diagram, where each circle represents a table.
An INNER JOIN (or simply JOIN) selects only those records that have
values in common between both tables (the grey section, labeled B).
An OUTER JOIN selects all records from the primary table, and any
matching records for the secondary table (where the secondary table has
values in common with the primary table). A LEFT OUTER JOIN (or simply
LEFT JOIN) would select A + B, whereas a RIGHT JOIN would select B + C.
The ON condition(s) specified in the JOIN indicate what values are
evaluated for commonality.
To further illustrate the difference between INNER JOINs, LEFT JOINs, and RIGHT JOINs, we’ll use a silly example, joining the tables
O_WAREHOUSES and D_WAREHOUSES in several ways. The results will be meaningless from a business sense, but hopefully
illustrate the differences in these types of joins. First off, we’ll look at the contents of these tables for all WAREHOUSE_ID values
that start with ‘SDF’:
O_WAREHOUSES
D_WAREHOUSES
SELECT /*+ use_hash(ow) */
ow.WAREHOUSE_ID ow_warehouse_id
FROM O_WAREHOUSES ow
WHERE ow.WAREHOUSE_ID LIKE 'SDF_';
SELECT /*+ use_hash(dw) */
dw.WAREHOUSE_ID dw_warehouse_id
FROM D_WAREHOUSES dw
WHERE dw.WAREHOUSE_ID LIKE 'SDF_';
OW_WAREHOUSE_ID
SDF1
SDF2
SDF3
SDF4
SDF6
DW_WAREHOUSE_ID
SDF1
SDF2
SDF4
SDF6
As the results above show, the O_WAREHOUSES has records for SDF1, SDF2, SDF3, SDF4 and SDF6, while the D_WAREHOUSES table
only has records for SDF1, SDF2, SDF4 and SDF6.
23
If we do an INNER JOIN of these two tables, we’ll only get results where a match is found between the two tables (as defined by the
columns in our ON condition:
SELECT /*+ use_hash(ow,dw) */
ow.WAREHOUSE_ID ow_warehouse_id
, dw.WAREHOUSE_ID dw_warehouse_id
FROM O_WAREHOUSES ow
JOIN D_WAREHOUSES dw
ON ow.WAREHOUSE_ID = dw.WAREHOUSE_ID
WHERE ow.WAREHOUSE_ID LIKE 'SDF_';
OW_WAREHOUSE_ID
SDF1
SDF2
SDF4
SDF6
DW_WAREHOUSE_ID
SDF1
SDF2
SDF4
SDF6
Since D_WAREHOUSES doesn’t have a record for WAREHOUSE_ID SDF3, no result is returned from either table with an INNER JOIN.
If we change the query to an LEFT JOIN, the results will change:
SELECT /*+ use_hash(ow,dw) */
ow.WAREHOUSE_ID ow_warehouse_id
, dw.WAREHOUSE_ID dw_warehouse_id
FROM O_WAREHOUSES ow
LEFT JOIN D_WAREHOUSES dw
ON ow.WAREHOUSE_ID = dw.WAREHOUSE_ID
WHERE ow.WAREHOUSE_ID LIKE 'SDF_';
OW_WAREHOUSE_ID
SDF1
SDF2
SDF3
SDF4
SDF6
DW_WAREHOUSE_ID
SDF1
SDF2
SDF4
SDF6
This time, we got results for all the records in O_WAREHOUSES, and the matching records (where they existed) in D_WAREHOUSES,
and got a NULL in the second column where it didn’t find a match.
The difference between a LEFT JOIN and a RIGHT JOIN is simply which tables are listed on the LEFT and RIGHT of the JOIN. In our last
example, O_WAREHOUSES is on the LEFT of the LEFT JOIN and D_WAREHOUSES is on the RIGHT of the LEFT JOIN. In a LEFT JOIN, the
table on the LEFT is given priority, and is the table that will return all results, even if no match is found in the table on the RIGHT of
the JOIN.
The same query could be written as a RIGHT JOIN and get the same results, simply by switching the order of the tables:
SELECT /*+ use_hash(ow,dw) */
ow.WAREHOUSE_ID ow_warehouse_id
, dw.WAREHOUSE_ID dw_warehouse_id
FROM D_WAREHOUSES dw
RIGHT JOIN O_WAREHOUSES ow
ON ow.WAREHOUSE_ID = dw.WAREHOUSE_ID
WHERE ow.WAREHOUSE_ID LIKE 'SDF_';
OW_WAREHOUSE_ID
SDF1
SDF2
SDF3
SDF4
SDF6
DW_WAREHOUSE_ID
SDF1
SDF2
SDF4
SDF6
The difference between RIGHT and LEFT JOINs is strictly placement of table names in the SQL. To keep things simple, I always use
LEFT JOINs. But it’s no better or worse than switching between LEFT and RIGHT JOINs, or using RIGHT JOINs exclusively. I
recommend using whatever works best for you.
There are some additional types of JOINs described in the text, but these are rarely used and often wildly inefficient.
24
WHERE Clause Conditions with OUTER Joins
Regardless of whether you have an OUTER join specified or not, anything in your WHERE clause will limit your results. If you include
a condition in your WHERE clause that applies to the secondary table on the RIGHT of a LEFT JOIN (or on the LEFT of a RIGHT JOIN),
the query will not act like an OUTER join, because you’ve limited the results with conditions on both tables, making it behave like an
INNER join. You’ve essentially overridden the OUTER JOIN by limiting the results to only records that exist in the secondary table.
For example, if we added a WHERE clause condition that applies to the DESCRIPTION column in O_PAYMENT_ITEM_TYPES – which is
on the RIGHT of a LEFT JOIN, we get only results where that condition is true – making the query behave like a INNER join:
SELECT /*+ use_hash(o_pit,o_pt) */
o_pit.PAYMENT_ITEM_TYPE_ID
, o_pit.DESCRIPTION
, o_pt.PAYMENT_TYPE_ID
, o_pt.DESCRIPTION
FROM O_PAYMENT_TYPES o_pt
LEFT JOIN O_PAYMENT_ITEM_TYPES o_pit
ON o_pit.PAYMENT_ITEM_TYPE_ID = o_pt.PAYMENT_TYPE_ID
WHERE o_pit.DESCRIPTION = 'Refund';
PAYMENT_ITEM_TYPE_ID
2
DESCRIPTION
Refund
PAYMENT_TYPE_ID
2
DESCRIPTION_1
zShops
One way around this problem is to place those conditions in the JOIN clause, like this:
SELECT /*+ use_hash(o_pit,o_pt) */
o_pit.PAYMENT_ITEM_TYPE_ID
, o_pit.DESCRIPTION
, o_pt.PAYMENT_TYPE_ID
, o_pt.DESCRIPTION
FROM O_PAYMENT_TYPES o_pt
LEFT JOIN O_PAYMENT_ITEM_TYPES o_pit
ON o_pit.PAYMENT_ITEM_TYPE_ID = o_pt.PAYMENT_TYPE_ID
AND o_pit.DESCRIPTION = 'Refund';
PAYMENT_ITEM_TYPE_ID
2
DESCRIPTION
Refund
PAYMENT_TYPE_ID
1
2
5
6
7
8
DESCRIPTION_1
Auctions
zShops
zMe
Marketplace
MVP
Catalogue
Putting the condition in the JOIN clause no longer limits the full query, as when it was in the WHERE clause, but it does still limit the
results. Think of it as limiting only the JOIN when it’s in the JOIN clause, but limiting the whole query when in the WHERE clause.
One-to-Many Joins
As the final topic this week, I wanted to end with a warning about JOINs of all kinds, by introducing the concept of the ‘grain’ of a
table. People talk about the grain of a table, and they mean the level of detail is in that table. For example,
D_DISTRIBUTOR_ORDERS is at the grain of POs. That means it contains just one row of data for each Purchase Order. The related
table D_DISTRIBUTOR_ORDER_ITEMS is at the grain of the PO and ASIN, so it has one row for each unique combination of PO and
ASIN. The somewhat related table, D_DISTRIBUTOR_SHIPMENT_ITEMS contains all the records of PO items that have been received,
and its grain is PO, ASIN and Shipment – because a single ASIN can be received to a single PO on multiple occasions. Knowing the
grain of a table (usually by looking at some sample data) is important to understanding how to properly join to it.
25
If I join D_DISTRIBUTOR_ORDER_ITEMS (with a grain of PO/ASIN) to D_DISTRIBUTOR_SHIPMENT_ITEMS (with a grain of
PO/ASIN/Shipment) on the PO and ASIN columns (ORDER_ID and ISBN), the results look straightforward for PO L9549101:
SELECT /*+ use_hash(doi,dsi) */
doi.ORDER_ID
, doi.ISBN
, doi.QUANTITY_SUBMITTED
, doi.QUANTITY
, dsi.QUANTITY_UNPACKED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN D_DISTRIBUTOR_SHIPMENT_ITEMS dsi
ON doi.ORDER_ID = dsi.ORDER_ID
AND doi.ISBN = dsi.ISBN
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090115','YYYYMMDD')
AND doi.ORDER_ID = 'L9549101'
AND dsi.REGION_ID = 1
AND dsi.RECEIVED_DAY = to_date('20090119','YYYYMMDD')
ORDER_ID
L9549101
ISBN
0316032220
QUANTITY_SUBMITTED
40
QUANTITY
40
QUANTITY_UNPACKED
40
It shows we submitted 40 units of ASIN 0316032220, 40 units were confirmed (QUANTITY), and 40 units were received
(QUANTITY_UNPACKED).
However, for an ASIN on a PO that was received in multiple shipments, things can look a little odd in the results:
SELECT /*+ use_hash(doi,dsi) */
doi.ORDER_ID
, doi.ISBN
, doi.QUANTITY_SUBMITTED
, doi.QUANTITY
, dsi.QUANTITY_UNPACKED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN D_DISTRIBUTOR_SHIPMENT_ITEMS dsi
ON doi.ORDER_ID = dsi.ORDER_ID
AND doi.ISBN = dsi.ISBN
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090126','YYYYMMDD')
AND doi.ORDER_ID = 'R1735263'
AND doi.ISBN = '0738210943'
AND dsi.REGION_ID = 1
AND dsi.RECEIVED_DAY BETWEEN to_date('20090205','YYYYMMDD')
AND to_date('20090213','YYYYMMDD')
ORDER_ID
R1735263
R1735263
ISBN
738210943
738210943
QUANTITY_SUBMITTED
19
19
QUANTITY
19
19
QUANTITY_UNPACKED
6
13
Because there are two records in the table D_DISTRIBUTOR_SHIPMENT_ITEMS that match to the PO and ASIN we are querying in
D_DISTRIBUTOR_ORDER_ITEMS, we get two records back. This is a One-to-Many join. Sometimes that’s just what you want, but in
this case, we might mistakenly think that we ordered 38 units (19+19), which is twice what we actually ordered.
We’ll explore some ways to avoid this issue later, but wanted you to begin thinking about table granularity and be aware of how it
can result in one-to-many joins and possible double-counting of records.
26
Week 4 Homework:
1.
Read Chapter 3 and the Appendix covering the old join syntax in Mastering Oracle SQL.
2.
Check out the tables D_MP_ASINS_ESSENTIALS and D_ASINS_MARKETPLACE_ATTRIBUTES in BI Metadata. What are the
Partitioned columns in each table? What do you think the grain is of each table? Which table includes the BINDING
column? Which table contains the ITEM_NAME column?
3.
Write a single query, joining those two tables together, to determine the name and binding of ASIN 0385240880 in the
Marketplace related to your business. Be sure to make use of Partitions in your WHERE clause, and use the ‘use_hash’ hint
in your SELECT clause. Run it through Explain Plan, then run it.
4.
Edit the query to add the Manufacturer Code for the ASIN. Be sure to make use of Partitions. Run it through Explain Plan
then run it. (hint: Look at the first example query from week 3, in the ETL section).
5.
Create a query to pull the Vendor Code, Vendor Name, Business Group ID and Business Group Name for Vendor ID 3453.
Run it through Explain Plan then run it. (hint: check out the first example query from week 4.)
6.
Edit the query to pull a list of all Business IDs and Business Group Names for Canada that do not match to any Vendor
Codes. Run it through Explain Plan then run it. (hint: look for records where the VENDOR_ID IS NULL).
7. Create a query that emails you every day with ASIN level details (including PO, Vendor, ASIN, and Quantity) of all receipts to
POs for vendor BATBO in the US. Let it run daily for a few days. Since D_DISTRIBUTOR_SHIPMENT_ITEMS is partitioned by
RECEIVED_DAY (in addition to REGION_ID), but we haven’t covered Dates yet, please include this in your WHERE clause:
AND RECEIVED_DAY = TO_DATE(‘{RUN_DATE_YYYYMMDD}’,’YYYYMMDD’), in addition to a condition on the other
partitioned column. Run it through Explain Plan before you schedule it.
27
Week 5 – Dealing with Dates in SQL
SQL Topics
 DATE vs. DATETIME columns
 The TO_CHAR() Function with Dates
 The TRUNC() Function with Dates
 The TO_DATE() Function
 Using BETWEEN with Dates
 Other Date functions
SQL TOPICS
DATE vs. DATETIME columns
While working with Data Warehouse tables, you’ll find two types of DATE columns: DATE columns that are truncated to only the
Month, Day, and Year information (e.g. 12/31/2008), and DATE columns that also contain the Hour, Minute, and Seconds (e.g.
12/31/2008 08:13:52) – known as the DATETIME format.
Both types of columns are of the Data Type ‘DATE’, and store full date & time information, but the DATE format columns are
truncated to the beginning of the first second of the day. Although it’s not always obvious from just looking at BI Metadata which
type a column is, most of the DATETIME fields have DATETIME in their name (like the columns ORDER_DATETIME and
CONFIRMATION_DATETIME in D_DISTRIBUTOR_ORDERS), while DATE type columns often use DATE or DAY in their column name
(like ORDER_DAY in D_DISTRIBUTOR_ORDERS). This isn’t a hard and fast rule, however, even within a single table. For example, the
column CREATION_DATE in D_DISTRIBUTOR_ORDERS is actually a DATETIME field, which we see via this query of the various date
fields in D_DISTRIBUTOR_ORDERS for PO M5969483.
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.CREATION_DATE
, ddo.ORDER_DAY
, ddo.ORDER_DATETIME
, ddo.CONFIRMATION_DATETIME
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';
ORDER_ID
M5969483
CREATION_DATE
3/25/2009 18:14
ORDER_DAY
3/25/2009
ORDER_DATETIME
3/25/2009 11:14
CONFIRMATION_DATETIME
3/25/2009 11:52
The TO_CHAR() Function with Dates
There are many ways to write a date, from the US standard of 03/31/2009 to the UK standard of 31/03/2009, writing them as March
31st, 2009, or combinations of words and numbers, like 31-MAR-09. Some of these formats can be very precise, while others are less
so. For example, if a Book was published on 31-MAR-09, do we know if it was published in 2009 or 1909? Unfortunately, we don’t,
and programs like Excel may make assumptions that could be wrong.
When writing SQL queries, you may find you want to control the format of a date column in your results, so you always know what
format it will be in and so there is never any question of exactly what the date means. To do this, we use the TO_CHAR() function,
which converts the DATE to a character string, in a format specified by you. To use the TO_CHAR() function, you include the column
name followed by a comma and then the format (enclosed in single quotes) within the parentheses. For example, we could convert
the ORDER_DATETIME to just the Month, Day, and Year format, we put the ORDER_DATETIME column name in the TO_CHAR()
function and then enter the format MM/DD/YYYY in single quotes, like this:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
28
, ddo.ORDER_DATETIME
, TO_CHAR(ddo.ORDER_DATETIME,'MM/DD/YYYY')
, ddo.ORDER_DAY
, TO_CHAR(ddo.ORDER_DAY,'MM/DD/YYYY HH24:SS:MI')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';
ORDER_ID
M5969483
ORDER_DATETIME
3/25/2009 11:14:56 AM
TO_CHAR(DDO.ORDER_DATETIME,'MM
3/25/2009
ORDER_DAY
3/25/2009
TO_CHAR(DDO.ORDER_DAY,'MM/DD/Y
03/25/2009 00:00:00
The format of the column returned in your results is what we specified, without the time stamp information. Also, notice that we
also formatted the ORDER_DAY column to include the full DATETIME - hours, minutes and seconds - in column 5 of our results. It
returns 03/25/2009 00:00:00, because the time data is always stored in, but is stored as the beginning of the first second of the day.
There are numerous formats you can use to get dates into the style you want, and you can mix-and-match components, as well. A
table begins on page 135 in Mastering Oracle SQL with a detailed list of options and their output, but here are some examples:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DATETIME
, TO_CHAR(ddo.ORDER_DATETIME,'YYYYMMDD')
, TO_CHAR(ddo.ORDER_DATETIME,'D')
, TO_CHAR(ddo.ORDER_DATETIME,'DAY')
, TO_CHAR(ddo.ORDER_DATETIME,'CC')
, TO_CHAR(ddo.ORDER_DATETIME,'HH AM" on a "Day", the "DDDTH" day of "YYYY"')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';
ORDER_ID
M5969483
ORDER_DATETIME
3/25/2009 11:14
YYYYMMDD
20090325
D
4
DAY
WEDNESDAY
CC
21
HH AM" on a "Day", the "DDDTH" day of "YYYY"
11 AM on a Wednesday, the 084TH day of 2009
You can get very simple (like finding the Century with CC) or very complex, such as creating a text string. Think about the format
that will be most meaningful to the people using your data. And don’t take for granted that a date field will output MM/DD/YYYY if
you don’t specify a format – ETL often seems to default to the troublesome DD-MON-YY format (e.g. 31-MAR-09).
29
The TRUNC() Function with Dates
Another type of conversion you can do to a DATE field is to truncate the date using the TRUNC() function. TRUN() is used much like
TO_CHAR, but instead of translating the DATE field into a character string, it truncates it to the level you specify, but leaves it in a
DATE format. One common example is to truncate a date to the first day of the week, which can be done like this:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DATETIME
, TRUNC(ddo.ORDER_DATETIME,'D')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';
ORDER_ID
M5969483
ORDER_DATETIME
3/25/2009 11:14
TRUNC(DDO.ORDER_DATETIME,'D')
3/22/2009
You’ll notice that when we used TRUNC with the ‘D’ option, it truncated the ORDER_DATETIME of 3/25/2009 11:14 to the first
second of the first hour of the first day of the week: 3/22/2009. A similar option, ‘DDD’, will truncate a date to the first second of
the first hour of the same day – essentially chopping off the timestamp information from a DATETIME field:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DATETIME
, TRUNC(ddo.ORDER_DATETIME,'DDD')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';
ORDER_ID
M5969483
ORDER_DATETIME
3/25/2009 11:14
TRUNC(DDO.ORDER_DATETIME,'DDD'
3/25/2009
Superficially, this looks like the same result we got from the TO_CHAR() function, but because TRUNC returns it’s result still in a DATE
format, we can perform math functions on the result, such as adding days, and logical functions like comparing to another date.
Since truncating a DATETIME to the start of that day is probably the most common use of the TRUNC() function, the developers of
SQL made it the default. So you can get the same result as above by leaving off a format, saving yourself some time:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DATETIME
, TRUNC(ddo.ORDER_DATETIME)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';
ORDER_ID
M5969483
ORDER_DATETIME
3/25/2009 11:14
TRUNC(DDO.ORDER_DATETIME)
3/25/2009
Like TO_CHAR(), there are numerous options to choose from when using TRUNC(), which are listed in a table that begins on page
159 of Mastering Oracle SQL. Here are just a few examples, truncating to the beginning of the month, quarter, year, and century:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DATETIME
, TRUNC(ddo.ORDER_DATETIME,'MM')
, TRUNC(ddo.ORDER_DATETIME,'Q')
, TRUNC(ddo.ORDER_DATETIME,'Y')
, TRUNC(ddo.ORDER_DATETIME,'CC')
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';
ORDER_ID
M5969483
ORDER_DATETIME
3/25/2009 11:14
MM
3/1/2009
Q
1/1/2009
Y
1/1/2009
CC
1/1/2001
30
The TO_DATE() Function
One frequent use of DATE columns, besides returning them in your results, is to use them in your WHERE clause to limit your results.
In fact, DATE columns are commonly used as partitions on tables, so this use is very common. A function called TO_DATE() comes in
handy when working with DATE columns in your WHERE clause. It’s essentially the opposite of the TO_CHAR() function – turning a
character string into a DATE format. This is vital, because you can’t compare a column that is in a DATE format to a text string – only
to a DATE. So when setting a conditional in your WHERE clause, you use the TO_DATE() function to translate a text string into a
DATE format, and then compare a DATE column to it. For example, if we wanted to see which POs have an ORDER_DAY of
3/25/2009, we’d compare the ORDER_DAY field to the text string 03/25/2009, but we’d convert that text string to a date before
doing the comparison using TO_DATE, like this:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DAY
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND ddo.ORDER_DAY = TO_DATE('03/25/2009','MM/DD/YYYY');
ORDER_ID
P0618301
M5969483
ORDER_DAY
3/25/2009
3/25/2009
The TO_DATE() function is taking the text string 03/25/2009, and converting it to a date format. The second part of the TO_DATE()
function indicates what format the text string is in, so it knows which numbers are the month, which are the day, and which are the
year. We could get the same results using a different format, as long as we change our text string to match that format:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DAY
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND ddo.ORDER_DAY = TO_DATE('20090325','YYYYMMDD');
ORDER_ID
P0618301
M5969483
ORDER_DAY
3/25/2009
3/25/2009
If the format of your text string and the format are not the same, however, you’ll get an error. For example, the following would
cause an error, because the format of the text string (‘20090325’) is not the same as the format indicated in the function
(‘MM/DD/YYYY’):
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DAY
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND ddo.ORDER_DAY = TO_DATE('20090325','MM/DD/YYYY');
ORA-12801: error signaled in parallel query server P054, instance db-dw2-6001.iad6.amazon.com:dw2-1 (1)
ORA-01843: not a valid month
31
Using BETWEEN with Dates
You can limit a DATE field to a specific date using the equal operator, but you can use other operators to build conditions in your
WHERE clause, too. The BETWEEN operator is commonly used to define a specific date range, that begins with the first date
specified, and ends with last date specified. Below, the query is limited to the date range 3/23/2009 through 3/25/2009:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DAY
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND ddo.ORDER_DAY BETWEEN TO_DATE('20090323','YYYYMMDD')
AND TO_DATE('20090325','YYYYMMDD');
ORDER_ID
M9119427
M2666981
U3517863
R5273263
N5183001
T0475345
M5969483
P0618301
ORDER_DAY
3/23/2009
3/23/2009
3/23/2009
3/23/2009
3/23/2009
3/23/2009
3/25/2009
3/25/2009
It’s important when using BETWEEN with DATETIME fields to remember that the second date listed in the range (03/25/2009 in our
example) is the end of the range, and that a date of 03/25/2009 means the first second of the first minute of the first hour of that
day. It’s actually 03/25/2009 00:00:00. When working with fields that are in the DATE format that isn’t an issue, as the example
above shows.
However, if we changed the WHERE condition so that it was on the ORDER_DATETIME field, instead of the ORDER_DAY field, we’ll
see a problem:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DATETIME
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND ddo.ORDER_DATETIME BETWEEN TO_DATE('20090323','YYYYMMDD')
AND TO_DATE('20090325','YYYYMMDD');
ORDER_ID
M9119427
M2666981
U3517863
R5273263
N5183001
T0475345
ORDER_DATETIME
3/23/2009 19:42
3/23/2009 20:26
3/23/2009 19:41
3/23/2009 19:42
3/23/2009 20:27
3/23/2009 19:41
Even though our DATE range ends with 03/25/2009, we don’t get any results for that day – even though we know from our previous
example that 2 POs were created that day for RANDO. That’s because the ORDER_DATETIME value for those 2 POs were after
03/25/2009 00:00:00 – the start of 03/25/2009. Another way of saying that is that 03/25/2009 03:03:48 (the order datetime of PO
P0618301) is greater than 03/25/2009 00:00:00, so is outside the range specified by the BETWEEN clause.
We can solve this problem by using a DATE column for our WHERE clause, if one is available, or by using the TRUNC() function in our
WHERE clause, so that we’re comparing the ORDER_DATETIME value truncated to the start of the day to our BETWEEN range.
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DATETIME
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND TRUNC(ddo.ORDER_DATETIME) BETWEEN TO_DATE('20090323','YYYYMMDD')
AND TO_DATE('20090325','YYYYMMDD');
ORDER_ID
U3517863
T0475345
R5273263
M9119427
M2666981
N5183001
P0618301
M5969483
ORDER_DATETIME
3/23/2009 19:41
3/23/2009 19:41
3/23/2009 19:42
3/23/2009 19:42
3/23/2009 20:26
3/23/2009 20:27
3/25/2009 3:03
3/25/2009 11:14
Now the results show the two POs placed on 3/25/2009, because the truncated version of the ORDER_DATETIME field is within the
date range. You could also change the second date in the range to be one date larger (03/26/2009 in our example) without using
the TRUNC() function, but then you’d risk getting results that happened to occur at 03/26/2009 00:00:00, which is a possibility with
some data sets. Using TRUNC() is a cleaner, safer, and easier method. When in doubt, use TRUNC().
32
Adding and Subtracting with Dates
Just like numerical fields, you can add to and subtract from DATE column values, both in your SELECT and WHERE clauses. When
adding and subtracting from DATE column values a value of 1 is equal to 1 day and not 1 hour or 1 minute or 1 second. If we add 1
to the ORDER_DATETIME values we returned in our last example, we see it increases the DATE value by 1 full day:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DATETIME
, ddo.ORDER_DATETIME + 1
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND TRUNC(ddo.ORDER_DATETIME) BETWEEN TO_DATE('20090323','YYYYMMDD')
AND TO_DATE('20090325','YYYYMMDD');
ORDER_ID
M9119427
P0618301
M2666981
U3517863
R5273263
M5969483
N5183001
T0475345
ORDER_DATETIME
3/23/2009 19:42
3/25/2009 3:03
3/23/2009 20:26
3/23/2009 19:41
3/23/2009 19:42
3/25/2009 11:14
3/23/2009 20:27
3/23/2009 19:41
DDO.ORDER_DATETIME+1
3/24/2009 19:42
3/26/2009 3:03
3/24/2009 20:26
3/24/2009 19:41
3/24/2009 19:42
3/26/2009 11:14
3/24/2009 20:27
3/24/2009 19:41
Thus, the value 3/23/2009 19:42 becomes 3/24/2009 19:42 – one full day later. (To add hours, minutes or seconds to a date, use a
fraction, such as 1/24 to add an hour, or 20/1440 to add twenty minutes.)
Perhaps a more common use is to add and subtract days from a date value in your WHERE clause. For example, we could rewrite
our query to change the BETWEEN range a bit, like this:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DATETIME
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'RANDO'
AND TRUNC(ddo.ORDER_DATETIME) BETWEEN TO_DATE('20090325','YYYYMMDD')-2
AND TO_DATE('20090325','YYYYMMDD');
ORDER_ID
U3517863
T0475345
R5273263
M9119427
M2666981
N5183001
P0618301
M5969483
ORDER_DATETIME
3/23/2009 19:41
3/23/2009 19:41
3/23/2009 19:42
3/23/2009 19:42
3/23/2009 20:26
3/23/2009 20:27
3/25/2009 3:03
3/25/2009 11:14
Instead of the start of the range being 03/23/2009, we’ve made it 2 days prior to the date 03/25/2009. This may seem strange, but
we’ll see how that can be very helpful in just a minute, when we talk about the Run Date Wildcard available in ETL Manager.
33
Other Date functions
Although TO_CHAR(), TRUNC(), and TO_DATE() are probably the most commonly used DATE functions, SQL includes several more
that you may find useful. These include:





ROUND( date , format ) – used to round a date up or down to the nearest day, month, year, etc.
ADD_MONTHS( date , number of months) – used to add (or subtract) months from a date
LAST_DAY( date) – used to determine the last day of the month the date falls in
NEXT_DAY( data , weekday ) – used to find the date of the next day following the date specified of the weekday specified
MONTHS_BETWEEN( later date, earlier date) – used to determine how many months are between two dates
Here are some examples of these functions in action:
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.ORDER_DAY
, ROUND(ddo.ORDER_DAY,'D')
, ADD_MONTHS(ddo.ORDER_DAY,-5)
, LAST_DAY(ddo.ORDER_DAY)
, NEXT_DAY(ddo.ORDER_DAY,'Friday')
, MONTHS_BETWEEN(ddo.ORDER_DAY,TO_DATE('20090101','YYYYMMDD'))
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID = 'M5969483';
ORDER_ID
M5969483
ORDER_DAY
3/25/2009
ROUND()
3/22/2009
ADD_MONTHS()
10/25/2008
LAST_DAY
3/31/2009
NEXT_DAY
3/27/2009
MONTHS_BETWEEN
2.774193548
(We subtracted months using ADD_MONTHS and -5 as our number of months.) There’s more information on using these fields in
your text. You can also use many of the standard aggregate functions, like AVG(), COUNT(), MAX(), and MIN() on DATE fields.
34
Week 5 Homework:
1.
Read Chapter 6 of Mastering Oracle SQL. Pay close attention to tables 6-1 (pg 135) and 6-2 (pg 159).
2.
Write a query to determine on what date the record in D_WAREHOUSES for the FC PHL1 was created. Use the TO_CHAR
function to ensure the date is returned in the format MM/DD/YYYY.
3.
Edit the query to determine what the first and last days of the week that record was created were, and format the dates in
the UK standard format (e.g. 31/10/2008).
4.
Write a query to find the WAREHOUSE_ID for all records in the D_WAREHOUSES table that were not created on 1/20/2009,
for FCs outside of North America. (hint: you’ll need to use TRUNC() and TO_DATE(), and you should get about 6 records
returned.)
5.
Edit the query to determine what day of the week each of those records were created.
6.
Write a query to pull a list of all PO and ASINs, with their submitted quantities and order dates, for the vendor code
‘DCCOM’, during the date range of 1/1/2009 through 1/15/2009, in the US. Be sure to make use of partitioned columns in
your WHERE clause, and run your query through Explain Plan before scheduling it.
7.
Edit the query to sum up the quantity field by ASIN, removing the order date and PO fields.
8.
Write a query against the table D_MP_ASINS_ESSENTIALS to pull the ASIN, ITEM_NAME, STREET_DAY, and
PUBLICATION_DAY columns for any US Books ASIN with a PUBLICATION_DAY greater than 1/1/2020.
9.
Edit the query to create an element that returns the STREET_DAY if it’s not null, but returns the PUBLICATION_DAY if
STREET_DAY is null. (hint: use the NVL() function to return pub date when street date is null.) This is a standard method
used to determine release date.
10. Create a query that emails you a summary of all the POs you created the previous day, with count of ASINs and total units
submitted for each PO, as well as any other details you’re interested in, such as order type and vendor code (use BI
Metdata to find what fields are available). Schedule this query to run daily, and let it run for at least 7 days. If you’re not a
buyer, select the login of a buyer to use for your query. (hint: you’ll need to use D_DISTRIBUTOR_ORDER_ITEMS to get the
ASIN level info.)
35
Week 6 – Subqueries
SQL Topics
 Subqueries
 Avoiding 1-to-many joins
SQL TOPICS
Subqueries
A subquery is a whole SQL statement that’s nested within another SQL statement – like a query within a query. The subquery runs
first then its results are stored in memory temporarily - like a temporary table – and then it’s discarded when the full SQL statement
is done running. Subqueries can be in the FROM clause and incorporated into a JOIN, or (less commonly due to efficiency issues) in
the WHERE clause to limit the results of the outer query. Here are examples of each:
FROM Clause JOIN to a
Subquery:
ASIN
037584726X
0789399903
ITEM_NAME
The Big Book of Princesses (Giant Coloring Book)
Skylines: American Cities Yesterday and Today
WHERE Clause limit
using a Subquery:
ASIN
037584726X
0789399903
SELECT /*+ use_hash(dma,ords) */
dma.ASIN
, dma.ITEM_NAME
, ords.QUANTITY_SUBMITTED
FROM D_MP_ASINS_ESSENTIALS dma
JOIN (SELECT /*+ use_hash(doi) */
doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM d_distributor_order_items doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807') ords
ON dma.ASIN = ords.ISBN
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1;
QUANTITY_SUBMITTED
7
1
SELECT /*+ use_hash(dma) */
dma.ASIN
, dma.ITEM_NAME
FROM D_MP_ASINS_ESSENTIALS dma
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.ASIN IN (SELECT /*+ use_hash(doi) */
doi.ISBN
FROM d_distributor_order_items doi;
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807');
ITEM_NAME
The Big Book of Princesses (Giant Coloring Book)
Skylines: American Cities Yesterday and Today
Subqueries are just like any SQL statement, but are enclosed in parentheses within another query. I think of the results of that
subquery as a table – so when you JOIN to a subquery, you’ll alias it, like you would a table, because you’ll need to define the
columns from each table in the JOIN condition and you may want to return some of the columns from your subquery in your results.
36
Stepping back to our first example of a subquery in the FROM clause, we see that we’ve inserted a full SELECT/FROM/WHERE query,
enclosed in parentheses in the FROM clause, and inner JOINed to it to effectively limit the ASINs in the table
D_MP_ASINS_ESSENTIALS to only those that match to the ASINs returned by the subquery – namely the ASINs on PO S2236807.
SELECT /*+ use_hash(dma,ords) */
dma.ASIN
, dma.ITEM_NAME
, ords.QUANTITY_SUBMITTED
FROM D_MP_ASINS_ESSENTIALS dma
JOIN (SELECT /*+ use_hash(doi) */
doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM d_distributor_order_items doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807') ords
ON dma.ASIN = ords.ISBN
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1;
ASIN
ITEM_NAME
037584726X
0789399903
The Big Book of Princesses (Giant Coloring Book)
Skylines: American Cities Yesterday and Today
QUANTITY_SUBMITTED
7
1
The subquery could be run on its own, giving you the list of ASINs – which is the first thing that happens when the SQL statement
runs. It runs the subquery, and then stores the results like a temporary table.
SELECT /*+ use_hash(doi) */
doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM d_distributor_order_items doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807';
ISBN
0789399903
037584726X
QUANTITY_SUBMITTED
1
7
Then the query JOINs the table D_MP_ASINS_ESSENTIALS to that temporary table, limiting the results because it’s an INNER join, but
also returning information from that temporary table – the QUANTITY_SUBMITTED.
37
Of course, you can also do an OUTER JOIN to a subquery, such as in this example:
SELECT /*+ use_hash(dma,ords) */
dma.ASIN
, dma.ITEM_NAME
, ords.QUANTITY_SUBMITTED
FROM D_MP_ASINS_ESSENTIALS dma
LEFT JOIN (SELECT /*+ use_hash(doi) */
doi.ISBN
, doi.QUANTITY_SUBMITTED
FROM d_distributor_order_items doi
WHERE doi.REGION_ID = 1
AND doi.ORDER_DAY = to_date('20090406','YYYYMMDD')
AND doi.ORDER_ID = 'S2236807') ords
ON dma.ASIN = ords.ISBN
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.ASIN IN ('037584726X','0789399903','0394873742');
ASIN
0789399903
037584726X
0394873742
ITEM_NAME
Skylines: American Cities Yesterday and Today
The Big Book of Princesses (Giant Coloring Book)
Richard Scarry's Biggest Word Book Ever!
QUANTITY_SUBMITTED
1
7
In this case, the LEFT OUTER JOIN resulted in all results being pulled from the left table (D_MP_ASINS_ESSENTIALS) and results from
the table on the right (our subquery) were returned, if available.
Avoiding 1-to-many joins
One of the many uses of subqueries is to avoid 1-to-many joins – situations where the grain of one table is different than the grain of
another, which can result in errors. Here’s an example of a 1-to-many join that causes a problem, from data tables that hold
Problem Receive information.
In the table O_RECEIVE_PROBLEM_ITEMS, we find one record associated with RECEIVE_PROBLEM_ITEM_ID 5739750, which shows a
QUANTITY of 1 was received into Problem Receive for ASIN B00158THNW.
SELECT /*+ use_hash(rpi) */
rpi.RECEIVE_PROBLEM_ITEM_ID
, rpi.ASIN
, rpi.QUANTITY
FROM O_RECEIVE_PROBLEM_ITEMS rpi
WHERE rpi.RECEIVE_PROBLEM_ITEM_ID IN (5739750);
RECEIVE_PROBLEM_ITEM_ID
5739750
ASIN
B00158THNW
QUANTITY
1
And in the table O_RPI_PROBLEM_LIST, we find that there are two records associated with that same RECEIVE_PROBLEM_ITEM_ID,
one for each of the two problem types found to have occurred for that item.
SELECT /*+ use_hash(rpl) */
rpl.RECEIVE_PROBLEM_ITEM_ID
, rpl.RECEIVE_PROBLEM_TYPE
FROM O_RPI_PROBLEM_LIST rpl
WHERE rpl.RECEIVE_PROBLEM_ITEM_ID IN (5739750);
RECEIVE_PROBLEM_ITEM_ID
5739750
5739750
RECEIVE_PROBLEM_TYPE
OVERAGE
WRONG_DC
Based on these two queries, we know that the one unit of ASIN B00158THNW recorded as RPI ID 5739750 had two problems. It was
an OVERAGE on the PO and it was delivered to the WRONG_DC.
38
If we join the two tables, the one record in the first table is duplicated for each record in the second table, including QUANTITY:
SELECT /*+ use_hash(rpi,rpl) */
rpi.RECEIVE_PROBLEM_ITEM_ID
, rpi.ASIN
, rpl.RECEIVE_PROBLEM_TYPE
, rpi.QUANTITY
FROM O_RECEIVE_PROBLEM_ITEMS rpi
JOIN O_RPI_PROBLEM_LIST rpl
ON rpi.RECEIVE_PROBLEM_ITEM_ID = rpl.RECEIVE_PROBLEM_ITEM_ID
AND rpi.WAREHOUSE_ID = rpl.WAREHOUSE_ID
WHERE rpi.RECEIVE_PROBLEM_ITEM_ID IN (5739750);
RECEIVE_PROBLEM_ITEM_ID
5739750
5739750
ASIN
B00158THNW
B00158THNW
RECEIVE_PROBLEM_TYPE
OVERAGE
WRONG_DC
QUANTITY
1
1
Based on this data, one might think that there were 2 units that arrived, not 1. The problem gets even less obvious when we
aggregate the query, counting the RECEIVE_PROBLEM_TYPE and summing the QUANTITY from our results:
SELECT /*+ use_hash(rpi,rpl) */
rpi.RECEIVE_PROBLEM_ITEM_ID
, rpi.ASIN
, COUNT(rpl.RECEIVE_PROBLEM_TYPE)
, SUM(rpi.QUANTITY)
FROM O_RECEIVE_PROBLEM_ITEMS rpi
JOIN O_RPI_PROBLEM_LIST rpl
ON rpi.RECEIVE_PROBLEM_ITEM_ID = rpl.RECEIVE_PROBLEM_ITEM_ID
AND rpi.WAREHOUSE_ID = rpl.WAREHOUSE_ID
WHERE rpi.RECEIVE_PROBLEM_ITEM_ID IN (5739750)
GROUP BY rpi.RECEIVE_PROBLEM_ITEM_ID
, rpi.ASIN;
RECEIVE_PROBLEM_ITEM_ID
5739750
ASIN
B00158THNW
COUNT(RPL.RECEIVE_PROBLEM_TYPE
2
SUM(RPI.QUANTITY)
2
One way we could get around this problem is to use a subquery to aggregate the results from the O_RPI_PROBLEM_LIST table first,
then join them to the O_RECEIVE_PROBLEM_ITEMS table:
SELECT /*+ use_hash(rpi,rpl2) */
rpi.RECEIVE_PROBLEM_ITEM_ID
, rpi.ASIN
, rpl2.PROBLEM_COUNT
, rpi.QUANTITY
FROM O_RECEIVE_PROBLEM_ITEMS rpi
JOIN (SELECT /*+ use_hash(rpl) */
rpl.RECEIVE_PROBLEM_ITEM_ID
, rpl.WAREHOUSE_ID
, COUNT(rpl.RECEIVE_PROBLEM_TYPE) PROBLEM_COUNT
FROM O_RPI_PROBLEM_LIST rpl
WHERE rpl.RECEIVE_PROBLEM_ITEM_ID IN (5739750)
GROUP BY rpl.RECEIVE_PROBLEM_ITEM_ID
, rpl.WAREHOUSE_ID
) rpl2
ON rpi.RECEIVE_PROBLEM_ITEM_ID = rpl2.RECEIVE_PROBLEM_ITEM_ID
AND rpi.WAREHOUSE_ID = rpl2.WAREHOUSE_ID
WHERE rpi.RECEIVE_PROBLEM_ITEM_ID IN (5739750);
RECEIVE_PROBLEM_ITEM_ID
5739750
ASIN
B00158THNW
PROBLEM_COUNT
2
QUANTITY
1
39
Now we get the proper results, showing the quantity of 1 unit, with 2 problems.
Notice that we aliased the column COUNT(rpl.RECEIVE_PROBLEM_TYPE) to PROBLEM_COUNT in our subquery, then referred to the
column by its alias in the outer query. Because the subquery is executed by Oracle first, and the results are saved as a table that’s
then used for the outer query, any column aliases in the subquery are now the column names of that temporary table, and that’s
how you must refer to them in the outer query.
This type of subquery is something that can be used any time you have two tables with a different grain of data that you need to
join, such as when you want to join PO ASIN information from D_DISTRIBUTOR_ORDER_ITEMS to PO ASIN Shipment information
from D_DISTRIBUTOR_SHIPMENT_ITEMS.
40
Week 6 Homework:
As always, remember to include WHERE clause conditions on any and all Partitioned columns, and run any query you write through
Explain Plan before running it.
1.
Read Chapter 5 of Mastering Oracle SQL
2.
Create a segment of the following ASINs:
0394873742
037584726X
0789399903
0345431391
0375847278
037584726X
0887767702
3.
Create a query that emails you the list of distinct ASINs in this segment.
4.
Use the query you created as a subquery in a query to find a list of all POs that were placed on 4/6/2009 in the US that
included those ASINs. In your results, include the PO, Vendor Code, ASIN, and quantity confirmed. (Hint: check out
D_DISTRIBUTOR_ORDER_ITEMS. The column QUANTITY_ORDERED indicates the number of units confirmed.)
5.
Edit the query to switch out the condition on Legal Entity ID in your WHERE clause to use the Legal Entity ID wildcard, and
rerun the query. Make sure your Job is set up to be partitioned by Legal Entity ID.
6.
Edit the query to switch out the segment ID in your subquery for the Free Form Tag Wildcard, and edit your job to put the
segment ID in the Free Form Tag field.
Extra Credit
7.
Edit the query to add a JOIN to the VENDORS table to get the name of each Vendor.
8.
Edit the query to add a JOIN to the D_MP_ASINS_ESSENTIALS table to get the title of each ASIN.
9.
Edit the query to remove the PO Field, and sum the number of units confirmed per ASIN, per Vendor.
10. Edit the query to return only those ASINs where the sum of units ordered was greater than 3.
41
Week 7 –DECODE & CASE
SQL Topics
 The DECODE() Function
 The CASE Function
SQL TOPICS
The DECODE() Function
DECODE() is one of SQL’s functions that fills the need of If-Then functionality. It’s essentially a way to translate or decode the values
in a column to another value. The format is DECODE(A,B,C,D) – which functions as: if A is equal to B, then return C, otherwise,
return D. It’s very much like the Excel function = IF(A=B,C,D).
The first value (A) is generally a column in one of the tables you’re querying, and B is a value that would be found in that column of
that table. C is what you want that value translated to in your results, and D is what you want returned in that column of your
results if A doesn’t match B. The B and C spots in the function can be repeated, giving you the ability to translate any of several
values in a single column to new values in your results (e.g. DECODE(A,B1,C1,B2,C2,B3,C3,D) ).
One example is the need to translate Order Type numbers to Order Type codes – such as translating the number 17 to NP and 9 to
LA – because PO Order Type is stored in all the key tables (e.g D_DISTRIBUTOR_ORDERS) as a number. There is a table in the Data
Warehouse that translates the number to text, but it doesn’t translate it to the two character code folks are familiar with:
SELECT /*+ use_hash(vot) */
vot.VENDOR_ORDER_TYPE
, vot.VENDOR_ORDER_TYPE_DESC
FROM VENDOR_ORDER_TYPES vot
WHERE vot.VENDOR_ORDER_TYPE IN (0,4);
VENDOR_ORDER_TYPE
0
2
VENDOR_ORDER_TYPE_DESC
None Specified / Distributor O
Special Order
As the sample above shows, the table includes a description, which isn’t always clear. For example, ‘Pubdirect Order’ is the
Advantage Order Type, and ‘None Specified/Distributor O’ is actually DS. Most folks seem to talk about these in terms of Order Type
Codes (like DS and LA), so it can be very useful to translate to those values when you run your queries. You can use DECODE() to do
this:
SELECT /*+ use_hash(vot) */
vot.VENDOR_ORDER_TYPE
, DECODE(vot.VENDOR_ORDER_TYPE,0,'DS',NULL)
FROM VENDOR_ORDER_TYPES vot
WHERE vot.VENDOR_ORDER_TYPE IN (0,4);
VENDOR_ORDER_TYPE
0
4
DECODE(VOT.VENDOR_ORDER_TYPE,0
DS
Here we’ve decoded the VENDOR_ORDER_TYPE column, and anytime the value in that column is 0, we return ‘DS’ as the result,
otherwise it returns NULL. So for 0 we get DS, and for 4 we get a null returned.
Translating one value is useful, but DECODE() can be used for multiple values, allowing you to specify what you want returned for
each. Here’s an example where we are decoding multiple values (0, 2, and 4) to what we want to see returned (DS, SP, and PD):
42
SELECT /*+ use_hash(vot) */
vot.VENDOR_ORDER_TYPE
, DECODE(vot.VENDOR_ORDER_TYPE,0,'DS',2,'SP',4,'PD','Unknown')
FROM VENDOR_ORDER_TYPES vot
WHERE vot.VENDOR_ORDER_TYPE IN (0,2,3,4);
VENDOR_ORDER_TYPE
0
2
3
4
DECODE(VOT.VENDOR_ORDER_TYPE,0
DS
SP
Unknown
PD
In this example, the DECODE is translating the values in the column vot.VENDOR_ORDER_TYPE. When it finds a value in that column
that’s equal to 0, it returns the text string ‘DS’. When it finds 2, it returns ‘SP’. When it finds 4, it returns ‘PD’, and if it finds anything
else (3 in this example) it returns ‘Unknown’.
The values returned can be text (as in the examples above), a number, or even another column. For example, we could change the
‘Unknown’ value in the above query to the VENDOR_ORDER_TYPE_DESC column:
SELECT /*+ use_hash(vot) */
vot.VENDOR_ORDER_TYPE
, DECODE(vot.VENDOR_ORDER_TYPE,0,'DS',2,'SP',4,'PD',vot.VENDOR_ORDER_TYPE_DESC)
FROM VENDOR_ORDER_TYPES vot
WHERE vot.VENDOR_ORDER_TYPE IN (0,2,3,4);
VENDOR_ORDER_TYPE
0
2
3
4
DECODE(VOT.VENDOR_ORDER_TYPE,0
DS
SP
Publisher Order
PD
Instead of Unknown, we get the value of the column VENDOR_ORDER_TYPE_DESC for any column that doesn’t match one of the
value we’ve already defined in the DECODE() statement – in this case, order type number 3.
You can keep adding pairs of values to translate various values, up to about 125 pairs. For example, here’s the full decode to
translate the numbers to the code for most of the current Order Type values:
SELECT /*+ use_hash(vot) */
vot.VENDOR_ORDER_TYPE
,DECODE(vot.VENDOR_ORDER_TYPE,0,'DS',1,'OP',2,'SP',3,'PB',4,'PD',6,'SU',7,'IS',8,'MS',9,'L
A',10,'LB',11,'LC',12,'LD',13,'SA',14,'SB',15,'SC',16,'SD',17,'NP',18,'RE',19,'VP',20,'MU'
,21,'T1',22,'T2',23,'T3',24,'B1',25,'B2',26,'B3',27,'M1',28,'M2',29,'M3',30,'R1',31,'R2',3
2,'R3',33,'PT',34,'DR',35,'MX', vot.VENDOR_ORDER_TYPE) AS ORDER_TYPE
FROM VENDOR_ORDER_TYPES vot;
43
The CASE Function
The CASE function is similar to DECODE, but with more advanced options. With CASE, you can evaluate not just if a column is equal
to a value, but if an expression is true, and return your results depending on whether or not that expression is true.
Here’s the same example we explored with DECODE above, but using CASE:
SELECT /*+ use_hash(vot) */
vot.VENDOR_ORDER_TYPE
, CASE WHEN vot.VENDOR_ORDER_TYPE = 0 THEN 'DS'
WHEN vot.VENDOR_ORDER_TYPE = 2 THEN 'SP'
WHEN vot.VENDOR_ORDER_TYPE = 4 THEN 'PD'
ELSE 'Unknown' END
FROM VENDOR_ORDER_TYPES vot
WHERE vot.VENDOR_ORDER_TYPE IN (0,2,3,4);
VENDOR_ORDER_TYPE
0
2
3
4
CASEWHENVOT.VENDOR_ORDER_TYPE=
DS
SP
Unknown
PD
In this example, we again evaluated the vot.VENDOR_ORDER_TYPE column, using the equal operator to see if it was equal to various
values. This is functionally identical to what DECODE does, just in a different way.
CASE really shows its value when you use other types of operators (rather than equals), or when it’s evaluating multiple columns.
In the example below, we evaluate the DEAL_CODE column to see if it’s NULL, and if it’s not NULL, return ‘Deal Buy’. If it is NULL,
then we move on to the next WHEN/THEN combo, which checks the ORDER_TYPE column to see if it’s a 9, in which case it returns
‘LA’, and so on.
SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
, ddo.DEAL_CODE
, ddo.ORDER_TYPE
, CASE WHEN ddo.DEAL_CODE IS NOT NULL THEN 'Deal Buy'
WHEN ddo.ORDER_TYPE = 9 THEN 'LA'
WHEN ddo.ORDER_TYPE = 2 THEN 'SP'
ELSE NULL END
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.ORDER_ID IN ('L3937793','B3074533','Q9166581');
ORDER_ID
B3074533
L3937793
Q9166581
DEAL_CODE
D0000001069
ORDER_TYPE
2
9
9
CASEWHENDDO.DEAL_CODEISNOTNULL
SP
LA
Deal Buy
It’s important to note that the first WHEN/THEN combination in a CASE statement is the first that’s evaluated, and if it’s true, the
following WHEN/THEN combinations aren’t evaluated, even if they’re true. In the example above, the first evaluation found that the
DEAL_CODE column was NOT NULL for the third record, so it returned ‘Deal Buy’ and stopped evaluating the rest of the CASE
statement. So even though the ORDER_TYPE was 9 (the second WHEN/THEN combination), because the previous WHEN/THEN was
true, the CASE statement stopped. So the order you enter your WHEN/THEN combinations in a CASE statement can impact your
results.
44
The NVL2 Function
Back in Week X we discussed the NVL() function, which translates any Null values to whatever you specify, and leaves Non-Null
values as is. A related but slightly more powerful function is NVL2(). NVL2() gives you the option of translating the Non-Null values
to something else, too.
The format is NVL2(A,B,C) – where A is the column or element to evaluate, B is what to return if it’s NOT Null, and C is what to return
if it IS Null.
For example, we might want to return a ‘N’ if we find a Null and return a ‘N’ if we find a Non-Null, as when we’re defining which
ASINs are Textbooks:
SELECT /*+ use_hash(dma,dmma) */
dma.ASIN
, dma.ITEM_NAME
, dmma.TEXTBOOK_TYPE
, NVL2(dmma.TEXTBOOK_TYPE,'Y','N')
FROM D_MP_ASINS_ESSENTIALS dma
LEFT JOIN D_MP_MEDIA_ASINS dmma
ON dma.MARKETPLACE_ID = dmma.MARKETPLACE_ID
AND dma.ASIN = dmma.ASIN
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.ASIN IN ('0596006322','B00167YLVA','B004GEB67C');
ASIN
ITEM_NAME
TEXTBOOK_TYPE
B004GEB67C
Beginning SQL Joes 2 Pros: (SQL Exam Prep Series 70-433 Volume 1 of 5) (DVD)
N
B00167YLVA
Fiskars SQL-7312 Squeeze Paper Punch, Large, Comma, Comma, Chameleon
N
0596006322
Mastering Oracle SQL, 2nd Edition
unknown
IS_TEXTBOOK
Y
Notice that a LEFT JOIN was used, because not all ASINs are found in the D_MP_MEDIA_ASINS table.
Data Type Consistency
When using functions like NVL(), DECODE(), CASE and NVL2() that convert values, it’s important to keep data types in mind (meaning
character strings, dates and numbers). These functions may fail if you mix data types in the outputs. For example, the query below
mixes numerical values (15+2) with text strings (‘N’) in the NVL2() function, resulting in an ORA-01722: invalid number error.
SELECT /*+ use_hash(dma,dmma) */
dma.ASIN
, dma.ITEM_NAME
, dmma.TEXTBOOK_TYPE
, NVL2(dmma.TEXTBOOK_TYPE,15+2,'N')
FROM D_MP_ASINS_ESSENTIALS dma
LEFT JOIN D_MP_MEDIA_ASINS dmma
ON dma.MARKETPLACE_ID = dmma.MARKETPLACE_ID
AND dma.ASIN = dmma.ASIN
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.ASIN IN ('0596006322','B00167YLVA','B004GEB67C');
So when using these helpful functions, be sure to keep their outputs all of the same data type.
45
Week 7 Homework:
1.
Read Chapter 9 in Mastering Oracle SQL, through page 219.
2.
Write a query to pull a list of all Purchase Orders placed last week for vendor code SIMON (or your favorite vendor code) in
the US, using a DECODE statement to translate the Order Type number to the 2-letter code.
3.
Add a CASE statement to the query, and when the Deal Code field isn’t blank, return ‘Deal Buy PO’. Otherwise, return
‘Auto’ for any POs that are of order types DS, SP, PD, SU, LA, LD, or NP, and ‘Manual’ for any other POs. (hint: Use the IN
operator to avoid having to enter so many WHEN/THEN combinations.)
4.
Schedule the query to run every week for the previous week, and publish to a text file.
5.
Link the output of your file to an Excel spreadsheet, so that you can update it every week with the new data.
6.
If you haven’t already, go back to Homework #2 from Week 2: Make sure you’re signed up to the etl-users@amazon.com
mailing list. This is vital not only as a resource for you when you run into trouble, but as a way to ensure you’re notified
when significant changes to the ETL Manager or to specific tables are going to occur. Sign up, create a rule to move all the
messages to a specific folder, and check that folder every so often. Read through the emails periodically to see what you
can learn. And when you see questions you know the answer to, help out the other folks in the etl-users community.
Continuing Learning:
1.
2.
Think of what types of data, if you had it at your fingertips in a report, would make your job easier, and do one or all of the
following:
a.
Pick one and use BI Metadata to try to find the tables & columns you need. Create an ETL Job, scheduling it to run
daily, weekly, or monthly using wildcards to ensure it will always include the data you need. Link the output file to
an Excel spreadsheet. Show your boss what you’ve done.
b.
Pick one that you think might be similar to a report already in existence, and ask the owner of that report for a
copy of their SQL. Edit the SQL to fit your use case and set it up to run to meet your needs.
c.
Pick one and email the etl-user@amazon.com mailing list to see if someone has a similar report that you could use
as a starting point.
Using Google, Mastering Oracle SQL, and other resources, keep learning new functions and operators to expand what you
can do with SQL. I recommend reading up on UPPER(), TRIM(), RTRIM(), LTRIM(), SUBSTR(), COALESCE(), RANK(), PARTITION
and PARTITION BY, WITH, EXISTS, UNION and UNION ALL.
46
Answer Key
Below are possible answers to the weekly homework assignments. There are almost always multiple ways to write the SQL to get
the correct answer, so these answers present only one of those possibilities.
It’s recommended that you attempt all the homework exercises prior to looking at the answers. If you get stuck, be sure you’ve read
the corresponding chapter in Mastering Oracle SQL, and reread through the week’s lesson. Some of the homework exercises use
functions, operators, and other code that was taught in prior weeks, so you may want to refer back to prior week’s lessons if
something doesn’t seem familiar or isn’t found in the chapter and lesson for that week’s homework. Also, remember to use the BI
Metadata, Explain Plan, and Wikis as references to help you with your homework exercises. Some of the exercises are specifically
designed to encourage your use of these resources, as they will be vital to your success at writing SQL at Amazon. Good Luck!
Week 1 – The Basics of ETL Manager and Basic Structure of SQL
2. SELECT d_w.WAREHOUSE_ID FC
FROM D_WAREHOUSES d_w
ORDER BY d_w.WAREHOUSE_ID;
3. SELECT d_w.WAREHOUSE_ID FC
, d_w.REGION_ID
, d_w.REGION_ID * 10 CALC
, 10 FACTOR
, d_w.WAREHOUSE_ID || '_' || d_w.REGION_ID FC_REGION
FROM D_WAREHOUSES d_w
ORDER BY d_w.WAREHOUSE_ID;
Week 2 – Exploring Tables and Building Queries to Pull Just the Results You Want
3. SELECT d_w.WAREHOUSE_ID
FROM D_WAREHOUSES d_w
WHERE d_w.REGION_ID = 1
AND d_w.HAS_AMAZON_INVENTORY = 'Y';
4. SELECT d_w.WAREHOUSE_ID
, d_w.NAME "FC Name"
FROM D_WAREHOUSES d_w
WHERE d_w.REGION_ID = 1
AND d_w.HAS_AMAZON_INVENTORY = 'Y'
AND d_w.NAME LIKE '%Logistics%';
6. SELECT pg.PRODUCT_GROUP
, NVL(pg.SHORT_DESC,'Unknown')
FROM PRODUCT_GROUPS pg
ORDER BY pg.PRODUCT_GROUP;
7. SELECT pg.PRODUCT_GROUP
, NVL(pg.SHORT_DESC,'Unknown')
, pg.DESCRIPTION
FROM PRODUCT_GROUPS pg
WHERE pg.PRODUCT_GROUP >= 14
AND pg.DESCRIPTION IN ('Books','Universal','Shops','Advertising','Art')
ORDER BY pg.PRODUCT_GROUP;
47
Week 3 –Partitions, Scheduling Jobs to Publish to Folders, Using the Job Run Wildcard, Aggregate Queries and HAVING Clause
2. D_DISTRIBUTOR_ORDERS is partitioned by REGION_ID.
3. SELECT /*+ use_hash(ddo) */
COUNT(ddo.ORDER_ID)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'PRBRC';
4. SELECT /*+ use_hash(ddo) */
COUNT(ddo.ORDER_ID)
, MIN(ddo.ORDER_DAY)
, MAX(ddo.ORDER_DAY)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'PRBRC';
5. SELECT /*+ use_hash(ddo) */
COUNT(ddo.ORDER_ID)
, MIN(ddo.ORDER_DAY)
, MAX(ddo.ORDER_DAY)
, COUNT(DISTINCT ddo.ORDER_DAY)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'PRBRC';
6. SELECT /*+ use_hash(ddo) */
COUNT(ddo.ORDER_ID)
, MIN(ddo.ORDER_DAY)
, MAX(ddo.ORDER_DAY)
, COUNT(DISTINCT ddo.ORDER_DAY)
, SUM(ddo.SHIPPING_COST)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'PRBRC';
7. SELECT /*+ use_hash(ddo) */
ddo.DISTRIBUTOR_ID
, COUNT(ddo.ORDER_ID)
, MIN(ddo.ORDER_DAY)
, MAX(ddo.ORDER_DAY)
, COUNT(DISTINCT ddo.ORDER_DAY)
, SUM(ddo.SHIPPING_COST)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'PRBRC'
GROUP BY ddo.DISTRIBUTOR_ID;
48
8. SELECT /*+ use_hash(ddo) */
ddo.DISTRIBUTOR_ID
, ddo.HANDLER
, COUNT(ddo.ORDER_ID)
, MIN(ddo.ORDER_DAY)
, MAX(ddo.ORDER_DAY)
, COUNT(DISTINCT ddo.ORDER_DAY)
, SUM(ddo.SHIPPING_COST)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'PRBRC'
GROUP BY ddo.DISTRIBUTOR_ID
, ddo.HANDLER;
9. SELECT /*+ use_hash(ddo) */
ddo.DISTRIBUTOR_ID
, ddo.HANDLER
, COUNT(ddo.ORDER_ID)
, MIN(ddo.ORDER_DAY)
, MAX(ddo.ORDER_DAY)
, COUNT(DISTINCT ddo.ORDER_DAY)
, SUM(ddo.SHIPPING_COST)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'PRBRC'
GROUP BY ddo.DISTRIBUTOR_ID
, ddo.HANDLER
HAVING COUNT(ddo.ORDER_ID) BETWEEN 30 AND 40;
10. SELECT /*+ use_hash(ddo) */
ddo.DISTRIBUTOR_ID
, ddo.HANDLER
, COUNT(ddo.ORDER_ID)
, MIN(ddo.ORDER_DAY)
, MAX(ddo.ORDER_DAY)
, COUNT(DISTINCT ddo.ORDER_DAY)
, SUM(ddo.SHIPPING_COST)
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'PRBRC'
GROUP BY ddo.DISTRIBUTOR_ID
, ddo.HANDLER
HAVING COUNT(ddo.ORDER_ID) BETWEEN 30 AND 40
AND COUNT(DISTINCT ddo.ORDER_DAY) >= 10;
Week 4 - Joining Tables
2. Both tables are partitioned by REGION_ID & MARKETPLACE_ID and are at the REGION_ID/MARKETPLACE_ID/ASIN grain. Both
tables include the column ITEM_NAME, though D_MP_ASINS_ESSENTIALS is considered the authority table for this information.
D_ASINS_MARKETPLACE_ATTRIBUTES includes the column BINDING.
49
3. SELECT /*+ use_hash(dma,da) */
da.ASIN
, dma.ITEM_NAME
, da.BINDING
FROM D_ASINS_MARKETPLACE_ATTRIBUTES da
JOIN D_MP_ASINS_ESSENTIALS dma
ON da.ASIN = dma.ASIN
WHERE da.ASIN = '0385240880'
AND da.REGION_ID = 1
AND da.MARKETPLACE_ID = 1
AND dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1;
4. SELECT /*+ use_hash(dma,da,dmam) */
da.ASIN
, dma.ITEM_NAME
, da.BINDING
, dmam.MANUFACTURER_CODE
FROM D_ASINS_MARKETPLACE_ATTRIBUTES da
JOIN D_MP_ASINS_ESSENTIALS dma
ON da.ASIN = dma.ASIN
JOIN D_MP_ASIN_MANUFACTURER dmam
ON da.ASIN = dmam.ASIN
WHERE da.ASIN = '0385240880'
AND da.REGION_ID = 1
AND da.MARKETPLACE_ID = 1
AND dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dmam.MARKETPLACE_ID = 1;
5. SELECT /*+ use_hash(v,o_abg) */
v.PRIMARY_VENDOR_CODE
, v.VENDOR_NAME
, v.AMAZON_BUSINESS_GROUP_ID
, o_abg.TYPE
FROM VENDORS v
JOIN O_AMAZON_BUSINESS_GROUPS o_abg
ON v.AMAZON_BUSINESS_GROUP_ID = o_abg.ID
WHERE v.VENDOR_ID = 3453;
6. SELECT /*+ use_hash(v,o_abg) */
o_abg.ID
, o_abg.TYPE
FROM VENDORS v
RIGHT OUTER JOIN O_AMAZON_BUSINESS_GROUPS o_abg
ON v.AMAZON_BUSINESS_GROUP_ID = o_abg.ID
WHERE v.VENDOR_ID IS NULL
AND o_abg.TYPE LIKE 'CA%';
7. SELECT /*+ use_hash(dsi) */
dsi.ORDER_ID
, dsi.ISBN
, dsi.QUANTITY_UNPACKED
FROM D_DISTRIBUTOR_SHIPMENT_ITEMS dsi
WHERE dsi.REGION_ID = 1
AND dsi.LEGAL_ENTITY_ID = 101
AND dsi.RECEIVED_DAY = TO_DATE('{RUN_DATE_YYYYMMDD}','YYYYMMDD')
AND dsi.DISTRIBUTOR_ID = 'BATBO';
50
Week 5 – Dealing with Dates in SQL and Using the Run Date Wildcard
2. SELECT /*+ use_hash(d_w) */
d_w.WAREHOUSE_ID
, TO_CHAR(d_w.DW_CREATION_DATE,'MM/DD/YYYY')
FROM D_WAREHOUSES d_w
WHERE d_w.WAREHOUSE_ID = 'PHL1';
3. SELECT /*+ use_hash(d_w) */
d_w.WAREHOUSE_ID
, TO_CHAR(d_w.DW_CREATION_DATE,'MM/DD/YYYY')
, TO_CHAR(TRUNC(d_w.DW_CREATION_DATE,'D'),'MM/DD/YYYY')
, TO_CHAR(TRUNC(d_w.DW_CREATION_DATE,'D')+6,'MM/DD/YYYY')
FROM D_WAREHOUSES d_w
WHERE d_w.WAREHOUSE_ID = 'PHL1';
4. SELECT /*+ use_hash(d_w) */
d_w.WAREHOUSE_ID
, d_w.DW_CREATION_DATE
FROM D_WAREHOUSES d_w
WHERE TRUNC(d_w.DW_CREATION_DATE) <> TO_DATE('20090120','YYYYMMDD')
AND d_w.REGION_ID <> 1;
5. SELECT /*+ use_hash(d_w) */
d_w.WAREHOUSE_ID
, d_w.DW_CREATION_DATE
, TO_CHAR(d_w.DW_CREATION_DATE,'Day')
FROM D_WAREHOUSES d_w
WHERE TRUNC(d_w.DW_CREATION_DATE) <> TO_DATE('20090120','YYYYMMDD')
AND d_w.REGION_ID <> 1;
6. SELECT /*+ use_hash(doi) */
doi.ORDER_ID
, doi.ISBN
, doi.QUANTITY_SUBMITTED
, doi.ORDER_DAY
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY BETWEEN TO_DATE('20090101','YYYYMMDD')
AND TO_DATE('20090115','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'DCCOM';
7. SELECT /*+ use_hash(doi) */
doi.ISBN
, SUM(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY BETWEEN TO_DATE('20090101','YYYYMMDD')
AND TO_DATE('20090115','YYYYMMDD')
AND doi.DISTRIBUTOR_ID = 'DCCOM'
GROUP BY doi.ISBN;
51
8. SELECT /*+ use_hash(dma)
dma.ASIN
, dma.ITEM_NAME
, dma.STREET_DAY
, dma.PUBLICATION_DAY
FROM D_MP_ASINS_ESSENTIALS
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.GL_PRODUCT_GROUP =
AND dma.PUBLICATION_DAY >=
*/
dma
14
TO_DATE('20200101','YYYYMMDD') ;
9. SELECT /*+ use_hash(dma) */
dma.ASIN
, dma.ITEM_NAME
, dma.STREET_DAY
, dma.PUBLICATION_DAY
, NVL(dma.STREET_DAY,dma.PUBLICATION_DAY)
FROM D_MP_ASINS_ESSENTIALS dma
WHERE dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
AND dma.GL_PRODUCT_GROUP = 14
AND dma.PUBLICATION_DAY >= TO_DATE('20200101','YYYYMMDD') ;
10. SELECT /*+ use_hash(ddo,doi) */
ddo.ORDER_ID
, ddo.ORDER_TYPE
, ddo.DISTRIBUTOR_ID
, COUNT(doi.ISBN)
, SUM(doi.QUANTITY_SUBMITTED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN D_DISTRIBUTOR_ORDERS ddo
ON doi.ORDER_ID = ddo.ORDER_ID
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY = TO_DATE('{RUN_DATE_YYYYMMDD}','YYYYMMDD')
AND ddo.REGION_ID = 1
AND ddo.HANDLER = 'username'
GROUP BY
ddo.ORDER_ID
, ddo.ORDER_TYPE
, ddo.DISTRIBUTOR_ID;
Week 6 – Subqueries, Segments, and More Wildcards
3. SELECT DISTINCT ASIN FROM PRODUCT_SEGMENT_MEMBERSHIP WHERE SEGMENT_ID = 623271;
4. SELECT /*+ use_hash(doi,seg) */
doi.ORDER_ID
, doi.DISTRIBUTOR_ID
, doi.ISBN
, doi.QUANTITY_ORDERED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN (SELECT
DISTINCT ASIN
FROM PRODUCT_SEGMENT_MEMBERSHIP
WHERE SEGMENT_ID = 623271) seg
ON doi.ISBN = seg.ASIN
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = 101
AND doi.ORDER_DAY = TO_DATE('20090406','YYYYMMDD');
52
5. SELECT /*+ use_hash(doi,seg) */
doi.ORDER_ID
, doi.DISTRIBUTOR_ID
, doi.ISBN
, doi.QUANTITY_ORDERED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN (SELECT
DISTINCT ASIN
FROM PRODUCT_SEGMENT_MEMBERSHIP
WHERE SEGMENT_ID = 623271) seg
ON doi.ISBN = seg.ASIN
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = {LEGAL_ENTITY_ID}
AND doi.ORDER_DAY = TO_DATE('20090406','YYYYMMDD');
6. SELECT /*+ use_hash(doi,seg) */
doi.ORDER_ID
, doi.DISTRIBUTOR_ID
, doi.ISBN
, doi.QUANTITY_ORDERED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN (SELECT
DISTINCT ASIN
FROM PRODUCT_SEGMENT_MEMBERSHIP
WHERE SEGMENT_ID = {FREE_FORM}) seg
ON doi.ISBN = seg.ASIN
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = {LEGAL_ENTITY_ID}
AND doi.ORDER_DAY = TO_DATE('20090406','YYYYMMDD');
7. SELECT /*+ use_hash(doi,v) */
doi.ORDER_ID
, doi.DISTRIBUTOR_ID
, v.VENDOR_NAME
, doi.ISBN
, doi.QUANTITY_ORDERED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN (SELECT
DISTINCT ASIN
FROM PRODUCT_SEGMENT_MEMBERSHIP
WHERE SEGMENT_ID = {FREE_FORM}) seg
ON doi.ISBN = seg.ASIN
JOIN VENDORS v
ON doi.DISTRIBUTOR_ID = v.PRIMARY_VENDOR_CODE
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = {LEGAL_ENTITY_ID}
AND doi.ORDER_DAY = TO_DATE('20090406','YYYYMMDD');
You might notice that when you join to VENDORS you lose data for all vendor codes that are less than 5 characters long, like BTM.
This is because the Vendor Code is stored in the DISTRIBUTOR_ID field of D_DISTRIBUTOR_ORDER_ITEMS with trailing spaces,
whereas the PRIMARY_VENDOR_CODE field in VENDORS doesn’t include those spaces, so it doesn’t find matches between ‘BTM ‘
and ‘BTM’. This can be remedied by using a function called RTRIM(), which trims spaces off the RIGHT side of whatever column you
put in the function. We could rewrite this query using the RTRIM() function in our JOIN, to trim the spaces off the DISTRIBUTOR_ID
field when joining to the PRIMARY_VENDOR_CODE field, so they’ll match even for Vendor Codes that are 2, 3, or 4 characters long.
53
SELECT /*+ use_hash(doi,v) */
doi.ORDER_ID
, doi.DISTRIBUTOR_ID
, v.VENDOR_NAME
, doi.ISBN
, doi.QUANTITY_ORDERED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN (SELECT
DISTINCT ASIN
FROM PRODUCT_SEGMENT_MEMBERSHIP
WHERE SEGMENT_ID = {FREE_FORM}) seg
ON doi.ISBN = seg.ASIN
JOIN VENDORS v
ON RTRIM(doi.DISTRIBUTOR_ID) = v.PRIMARY_VENDOR_CODE
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = {LEGAL_ENTITY_ID}
AND doi.ORDER_DAY = TO_DATE('20090406','YYYYMMDD');
Check out https://w.amazon.com/?RTrimVendorCode for more information on RTRIM(), including which tables do and don’t include
those leading spaces on the Vendor Code column.
8. SELECT /*+ use_hash(doi,v,dma,seg) */
doi.ORDER_ID
, doi.DISTRIBUTOR_ID
, v.VENDOR_NAME
, doi.ISBN
, dma.ITEM_NAME
, doi.QUANTITY_ORDERED
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN (SELECT
DISTINCT ASIN
FROM PRODUCT_SEGMENT_MEMBERSHIP
WHERE SEGMENT_ID = {FREE_FORM}) seg
ON doi.ISBN = seg.ASIN
JOIN VENDORS v
ON doi.DISTRIBUTOR_ID = v.PRIMARY_VENDOR_CODE
JOIN D_MP_ASINS_ESSENTIALS dma
ON doi.ISBN = dma.ASIN
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = {LEGAL_ENTITY_ID}
AND doi.ORDER_DAY = TO_DATE('20090406','YYYYMMDD')
AND dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
54
9. SELECT /*+ use_hash(doi,v,dma,seg) */
doi.DISTRIBUTOR_ID
, v.VENDOR_NAME
, doi.ISBN
, dma.ITEM_NAME
, SUM(doi.QUANTITY_ORDERED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN (SELECT
DISTINCT ASIN
FROM PRODUCT_SEGMENT_MEMBERSHIP
WHERE SEGMENT_ID = {FREE_FORM}) seg
ON doi.ISBN = seg.ASIN
JOIN VENDORS v
ON doi.DISTRIBUTOR_ID = v.PRIMARY_VENDOR_CODE
JOIN D_MP_ASINS_ESSENTIALS dma
ON doi.ISBN = dma.ASIN
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = {LEGAL_ENTITY_ID}
AND doi.ORDER_DAY = TO_DATE('20090406','YYYYMMDD')
AND dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
GROUP BY
doi.DISTRIBUTOR_ID
, v.VENDOR_NAME
, doi.ISBN
, dma.ITEM_NAME;
10. SELECT /*+ use_hash(doi,v,dma,seg) */
doi.DISTRIBUTOR_ID
, v.VENDOR_NAME
, doi.ISBN
, dma.ITEM_NAME
, SUM(doi.QUANTITY_ORDERED)
FROM D_DISTRIBUTOR_ORDER_ITEMS doi
JOIN (SELECT
DISTINCT
ASIN
FROM PRODUCT_SEGMENT_MEMBERSHIP
WHERE SEGMENT_ID = {FREE_FORM}) seg
ON doi.ISBN = seg.ASIN
JOIN VENDORS v
ON doi.DISTRIBUTOR_ID = v.PRIMARY_VENDOR_CODE
JOIN D_MP_ASINS_ESSENTIALS dma
ON doi.ISBN = dma.ASIN
WHERE doi.REGION_ID = 1
AND doi.LEGAL_ENTITY_ID = {LEGAL_ENTITY_ID}
AND doi.ORDER_DAY = TO_DATE('20090406','YYYYMMDD')
AND dma.REGION_ID = 1
AND dma.MARKETPLACE_ID = 1
GROUP BY
doi.DISTRIBUTOR_ID
, v.VENDOR_NAME
, doi.ISBN
, dma.ITEM_NAME
HAVING SUM(doi.QUANTITY_ORDERED) > 3;
55
Week 7 - DECODE & CASE, Troubleshooting, Stealing SQL from DSS Queries, and Linking ETL Output to Excel
2. SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
,
DECODE(ddo.ORDER_TYPE,0,'DS',1,'OP',2,'SP',3,'PB',4,'PD',6,'SU',7,'IS',8,'MS',9,'LA',10,'L
B',11,'LC',12,'LD',13,'SA',14,'SB',15,'SC',16,'SD',17,'NP',18,'RE',19,'VP',20,'MU',21,'T1'
,22,'T2',23,'T3',24,'B1',25,'B2',26,'B3',27,'M1',28,'M2',29,'M3',30,'R1',31,'R2',32,'R3',3
3,'PT',34,'DR',35,'MX',ddo.ORDER_TYPE) AS ORDER_TYPE
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'SIMON'
AND ddo.ORDER_DAY BETWEEN TO_DATE('{RUN_DATE_YYYYMMDD}','YYYYMMDD')-6 AND
TO_DATE('{RUN_DATE_YYYYMMDD}','YYYYMMDD');
3. SELECT /*+ use_hash(ddo) */
ddo.ORDER_ID
,
DECODE(ddo.ORDER_TYPE,0,'DS',1,'OP',2,'SP',3,'PB',4,'PD',6,'SU',7,'IS',8,'MS',9,'LA',10,'L
B',11,'LC',12,'LD',13,'SA',14,'SB',15,'SC',16,'SD',17,'NP',18,'RE',19,'VP',20,'MU',21,'T1'
,22,'T2',23,'T3',24,'B1',25,'B2',26,'B3',27,'M1',28,'M2',29,'M3',30,'R1',31,'R2',32,'R3',3
3,'PT',34,'DR',35,'MX',ddo.ORDER_TYPE) ORDER_TYPE
, CASE WHEN ddo.DEAL_CODE IS NOT NULL THEN 'Deal Buy PO'
WHEN ddo.ORDER_TYPE IN (0,2,4,6,9,12,17) THEN 'Auto'
ELSE 'Manual'
END ORDER_METHOD
FROM D_DISTRIBUTOR_ORDERS ddo
WHERE ddo.REGION_ID = 1
AND ddo.LEGAL_ENTITY_ID = 101
AND ddo.DISTRIBUTOR_ID = 'SIMON'
AND ddo.ORDER_DAY BETWEEN TO_DATE('{RUN_DATE_YYYYMMDD}','YYYYMMDD')-6 AND
TO_DATE('{RUN_DATE_YYYYMMDD}','YYYYMMDD');
56
Download