Uploaded by Luna Yu

BIT2008 - Filters & Aggregation

advertisement
Filters &
Aggregation
BIT 2008
BIT 2008
Slides by James Brunet
Based on slides from David
Sprague and Chris Joslin
Table of contents
01
What is SQL
03
Aggregation
02
Filters
01
What is SQL?
Imperative vs Declarative
Imperative programming is when you instruct the computer,
step by step, how to perform a task.
The code you write describes how you want a task to be
performed.
Many programming languages you are familiar with (C, C++,
Python) are imperative programming languages.
Imperative vs Declarative
Declarative programming is when you describe what you want
without describing the step-by-step process. You don't have to
say how the computer gets the result.
SQL is a declarative programming language.
(So is Prolog. Python even supports some declarative features, but is primarily imperative)
Example: Displaying Odd Numbers
numbers = [-2, -1, 0, 1, 2, 3]
positives = []
for num in numbers:
if num > 0:
positives.append(num)
print(positives)
Example: Displaying Odd Numbers
Assume we have a table named numbers with a single column
value that contains integers.
SELECT value FROM numbers WHERE value > 0;
SQL Introduction
SQL basically comprises one of the following operations:
●
●
●
●
Extracting specific data from a Database
Inserting new data into a Database
Modifying existing data in a Database
Deleting data from a Database
SQL Introduction
SQL is segmented into 3 key components
●
●
●
Data Manipulation Language (DML) – This component is the
most commonly used and provides four basic access types
to a database
Data Definition Language (DDL) – This is used to create,
modify, or remove tables and other database objects
Data Control Language (DCL) – This is used to manage
database security, user access to tables and data
SQL Introduction
SQL is used to format a structured statement to “Query” a
database
That is SQL is used to ask certain “Questions” and have “Answers”
returned in the form of data from the database
Example: for a University Course database, we might ask the
question
“Which Courses are offered?”
The answer would be the “list of courses”
SELECT
A wildcard “*” is used to indicate an unknown element, and alone
can mean all
Example: If we wanted to retrieve “all” data from the table
“AddressBook”:
SELECT * FROM AddressBook;
“FROM” is another operator used to indicate the location
SELECT
In the previous example, the table was specified, but not the
database or schema. (Reminder: schemas are like folders for
tables)
In order to specify the database, as well as the table, the dot
notation “.” is used in the formatted query
Example:
SELECT * FROM jamesbrunet.public.addresses;
SELECT
Table and Column access often have problems due to the naming
conventions used
A Table name with spaces “My Table”, can cause problems and
need to be handled in a specific way
Example:
SELECT * FROM “My Table”;
SELECT
To access a single column, this needs to be specified in place of
the wildcard “*” after “SELECT”
Example: SELECT Name FROM AddressBook;
Name is the column name from the table
AddressBook is the name of the tabl
SELECT
Multiple Columns
It is equally possible to retrieve information from multiple columns,
by comma delimiting “,” your column names
Example:
SELECT Name, City FROM AddressBook;
The data from columns are returned in the order requested
SELECT: ALIASES
Aliases are used to provide another name to a column or table in
order to make it easier to read an SQL statement
SELECT name AS first_name, surname as last_name FROM
AddressBook;
SELECT: ALIASES
Aliases can be applied to Tables as well
SELECT ab.name AS first_name, ab.surname as last_name
FROM AddressBook ab;
SELECT: ALIASES
Aliases are practical when returning calculations
SELECT CONCAT(name, ' ', surname) as fullname from
addressbook
SELECT: ALIASES
Aliases are practical when returning calculations
SELECT CONCAT(name, ' ', surname) as fullname from
addressbook
Output:
Fullname
========
Jimmy Brunet
Abel Charlebois
...
SELECT: ORDER BY
ORDER BY allows us to sort. By default, it is in ascending order.
SELECT name as first_name, surname as last_name from
addressbook order by last_name
Output:
first_name | last_name
===========|==========
Joseph
| Aaronson
Jimmy
| Brunet
Abel
| Charlebois
Zoey
| Larsen
SELECT: ORDER BY
You can also sort by descending order
SELECT name as first_name, surname as last_name from
addressbook order by last_name DESC
Output:
first_name | last_name
===========|==========
Zoey
| Larsen
Abel
| Charlebois
James
| Brunet
Joseph
| Aaronson
02
Filters
Why filter?
For the queries submitted in previous examples, all of the rows
from the relevant columns were returned
In general, it is often desirable to only return the relevant rows; this
is performed using data filtering
Data Filtering is a powerful part of SQL statements and has many
elements
DISTINCT
“DISTINCT” is used to return only the rows unique to a specific
column; i.e. for a specific column only the first row found with a
specific value is returned
Example:
SELECT DISTINCT City FROM AddressBook
Output:
London
Ottawa
Chicago
DISTINCT
Using multiple columns provides unique combinations using
“DISTINCT”, rather than just a single column output
Example:
SELECT DISTINCT City, Country FROM AddressBook
Output:
London | Canada
London | United Kingdom
Ottawa | Canada
WHERE
The “WHERE” clause permits filtering by a particular condition
Syntax:
SELECT Column FROM Table WHERE <Condition>
The condition syntax can be constructed from various operators
and keywords
WHERE
“WHERE” basically applies a search condition on the rows in the
table. The condition is a combination of operators and values, and
a condition is met when this combination is equal to TRUE
Example:
WHERE LastName = ‘Turing’ would return all rows where
LastName is exactly equal to the string “Turing”
WHERE AccountBalance > 1000 would return all rows where
AccountBalance is greater than 1000
WHERE LENGTH(LastName) > 3 AND LENGTH(LastName) < 6
would return all rows where the length of the LastName is greater
than 3 and less than 6. (4 or 5 characters)
WHERE
“WHERE” can also compare two columns
Example:
SELECT first_name, last_name, address FROM Contacts
WHERE first_name = last_name;
Example output:
first_name | last_name | address
=========================================
Mohammad
| Mohammad | 123 sample st
Nguyen
| Nguyen
| 456 placeholder pl
James
| James
| 789 demonstration dr
Different operators
“=” – Checks if Values are Equal
“!=” – Checks if Values are Not Equal
“<>” – Same as above “!=”
“>” – Greater than
“<” – Less than
“>=” – Greater than or Equal to
“<=” – Less than or Equal to
Combinations
A single “WHERE” clause might not sufficiently return the required
data, and it might be necessary to add more conditions
Here we use a link statement “AND” or “OR”
Syntax:
SELECT Column FROM Table WHERE Condition AND/OR Condition
Combinations
Example:
SELECT ZipCode, City, State FROM ZipCodes WHERE City =
'New York' OR City = ‘Miami’;
Here records where New York or Miami are present will be returned
NOT
“NOT” is an additional link keyword that can be added to a
statement making a condition TRUE when it is not met
Example:
SELECT City, State FROM PostalCodes WHERE Province =
'ON' AND NOT City = 'Ottawa';
Ranges
Link conditions can be used to perform range tests
Example:
SELECT PartDescription FROM Orders WHERE Ordered >=
'2004-10-22' AND Ordered <= '2005-10-22';
You can also use BETWEEN for this purpose
SELECT Description FROM Orders WHERE Ordered BETWEEN
‘2004-10-22’ AND ‘2005-10-22’
Ranges
It is also possible to determine if a row’s field is part of a set of
values
Example:
SELECT Orders.OrderID, Orders.ContactID FROM Orders WHERE
(((Orders.OrderDate)=#4/15/1999#) OR
((Orders.OrderDate)=#4/15/2000#) OR
((Orders.OrderDate)=#4/21/1999#));
IN
In the previous case the clause can become very large, very
quickly
Instead we can use “IN” to create a set
Example: the previous example would be rewritten as
SELECT OrderID, ContactID FROM Orders WHERE OrderDate IN
(#4/15/1999#, #4/15/2000#, #4/21/1999#);
03
Simple
Aggregations
Grouping/Summarizing Data
SQL includes methods to examine multiple rows of data
simultaneously and provide a summary of that data
“COUNT()” – Returns number of rows meeting a certain condition
“SUM()” – Returns the Sum of All Rows
“MIN()” – Returns the Minimum Value of All Rows
“MAX()” – Returns the Maximum Value of All Rows
“AVG()” – Returns the Average Value of All Rows
Grouping/Summarizing Data
“COUNT(Condition)” is used to count the number of rows meeting
the “Condition”
Syntax:
SELECT COUNT(Condition) FROM Table
“WHERE” can be added to the query in order to filter out specific
rows – otherwise all are examined
Grouping/Summarizing Data
‘*’ – Counts all rows selected, including those with NULL values
‘ALL Column’ Count all rows with a non-NULL values for the
specified Column
‘DISTINCT Column’ Count all Unique rows with a non-NULL value for
the specified Column
Examples:
SELECT COUNT(*) FROM ZipCodes
SELECT COUNT(State) FROM ZipCodes
SELECT COUNT(DISTINCT State) from ZipCodes
Grouping/Summarizing Data
As ‘COUNT()’ is returned in a column, comma-separated
instances allow for multiple counts
Example:
SELECT COUNT(*) AS EntryCount, COUNT(DISTINCT State) AS
HasState FROM ZipCodes;
GROUP BY
Grouping Data is an essential part of summarisation
Grouping provides a clearer breakdown of the summarisation
“GROUP BY” is used to specify the grouping in order to return
multiple results of the summarisation
Example: Ottawa Breweries
There are 40 craft breweries in the Ottawa area
Our database contains the status of a brewery: whether it is active
or closed, as well as the name of the brewery, and the city council
district (ward) that it is located in.
name
| status | ward
======================================
Ashton Brewery | active | Rideau-Jock
Big Rig
| active | Kitchissippi
Draft Horse
| closed | Orleans South-Navan
Flora Hall
| active | Somerset
Example: Ottawa Breweries
One operation we could do is a COUNT()
name
| status | ward
======================================
Ashton Brewery | active | Rideau-Jock
Big Rig
| active | Kitchissippi
Draft Horse
| closed | Orleans South-Navan
Flora Hall
| active | Somerset
SELECT COUNT(*) as total_breweries FROM BREWERIES
total_breweries
===============
4
Example: Ottawa Breweries
A group by lets us do more interesting aggregation
name
| status | ward
======================================
Ashton Brewery | active | Rideau-Jock
Big Rig
| active | Kitchissippi
Draft Horse
| closed | Orleans South-Navan
Flora Hall
| active | Somerset
SELECT status, COUNT(*) as total_breweries FROM BREWERIES GROUP
BY status
status | total breweries
========================
active | 3
closed | 1
Quiz!
Earn participation marks!
https://pollev.com/jamesbrunet123
Log in with your cmail address
yourname@cmail.carleton.ca
NOT your.name@carleton.ca
If time permits - demo!
That's all!
Do you have any questions? You can ask
now, or post to the Brightspace forums!
For personal questions, email:
jamesbrunet@cunet.carleton.ca
CREDITS: This presentation template was created by
Slidesgo, including icons by Flaticon and infographics
& images by Freepik
Download