Filters & Aggregation BIT 2008 BIT 2008 Slides by James Brunet Based on slides from David Sprague and Chris Joslin Table of contents 01 What is SQL 03 Aggregation 02 Filters 01 What is SQL? Imperative vs Declarative Imperative programming is when you instruct the computer, step by step, how to perform a task. The code you write describes how you want a task to be performed. Many programming languages you are familiar with (C, C++, Python) are imperative programming languages. Imperative vs Declarative Declarative programming is when you describe what you want without describing the step-by-step process. You don't have to say how the computer gets the result. SQL is a declarative programming language. (So is Prolog. Python even supports some declarative features, but is primarily imperative) Example: Displaying Odd Numbers numbers = [-2, -1, 0, 1, 2, 3] positives = [] for num in numbers: if num > 0: positives.append(num) print(positives) Example: Displaying Odd Numbers Assume we have a table named numbers with a single column value that contains integers. SELECT value FROM numbers WHERE value > 0; SQL Introduction SQL basically comprises one of the following operations: ● ● ● ● Extracting specific data from a Database Inserting new data into a Database Modifying existing data in a Database Deleting data from a Database SQL Introduction SQL is segmented into 3 key components ● ● ● Data Manipulation Language (DML) – This component is the most commonly used and provides four basic access types to a database Data Definition Language (DDL) – This is used to create, modify, or remove tables and other database objects Data Control Language (DCL) – This is used to manage database security, user access to tables and data SQL Introduction SQL is used to format a structured statement to “Query” a database That is SQL is used to ask certain “Questions” and have “Answers” returned in the form of data from the database Example: for a University Course database, we might ask the question “Which Courses are offered?” The answer would be the “list of courses” SELECT A wildcard “*” is used to indicate an unknown element, and alone can mean all Example: If we wanted to retrieve “all” data from the table “AddressBook”: SELECT * FROM AddressBook; “FROM” is another operator used to indicate the location SELECT In the previous example, the table was specified, but not the database or schema. (Reminder: schemas are like folders for tables) In order to specify the database, as well as the table, the dot notation “.” is used in the formatted query Example: SELECT * FROM jamesbrunet.public.addresses; SELECT Table and Column access often have problems due to the naming conventions used A Table name with spaces “My Table”, can cause problems and need to be handled in a specific way Example: SELECT * FROM “My Table”; SELECT To access a single column, this needs to be specified in place of the wildcard “*” after “SELECT” Example: SELECT Name FROM AddressBook; Name is the column name from the table AddressBook is the name of the tabl SELECT Multiple Columns It is equally possible to retrieve information from multiple columns, by comma delimiting “,” your column names Example: SELECT Name, City FROM AddressBook; The data from columns are returned in the order requested SELECT: ALIASES Aliases are used to provide another name to a column or table in order to make it easier to read an SQL statement SELECT name AS first_name, surname as last_name FROM AddressBook; SELECT: ALIASES Aliases can be applied to Tables as well SELECT ab.name AS first_name, ab.surname as last_name FROM AddressBook ab; SELECT: ALIASES Aliases are practical when returning calculations SELECT CONCAT(name, ' ', surname) as fullname from addressbook SELECT: ALIASES Aliases are practical when returning calculations SELECT CONCAT(name, ' ', surname) as fullname from addressbook Output: Fullname ======== Jimmy Brunet Abel Charlebois ... SELECT: ORDER BY ORDER BY allows us to sort. By default, it is in ascending order. SELECT name as first_name, surname as last_name from addressbook order by last_name Output: first_name | last_name ===========|========== Joseph | Aaronson Jimmy | Brunet Abel | Charlebois Zoey | Larsen SELECT: ORDER BY You can also sort by descending order SELECT name as first_name, surname as last_name from addressbook order by last_name DESC Output: first_name | last_name ===========|========== Zoey | Larsen Abel | Charlebois James | Brunet Joseph | Aaronson 02 Filters Why filter? For the queries submitted in previous examples, all of the rows from the relevant columns were returned In general, it is often desirable to only return the relevant rows; this is performed using data filtering Data Filtering is a powerful part of SQL statements and has many elements DISTINCT “DISTINCT” is used to return only the rows unique to a specific column; i.e. for a specific column only the first row found with a specific value is returned Example: SELECT DISTINCT City FROM AddressBook Output: London Ottawa Chicago DISTINCT Using multiple columns provides unique combinations using “DISTINCT”, rather than just a single column output Example: SELECT DISTINCT City, Country FROM AddressBook Output: London | Canada London | United Kingdom Ottawa | Canada WHERE The “WHERE” clause permits filtering by a particular condition Syntax: SELECT Column FROM Table WHERE <Condition> The condition syntax can be constructed from various operators and keywords WHERE “WHERE” basically applies a search condition on the rows in the table. The condition is a combination of operators and values, and a condition is met when this combination is equal to TRUE Example: WHERE LastName = ‘Turing’ would return all rows where LastName is exactly equal to the string “Turing” WHERE AccountBalance > 1000 would return all rows where AccountBalance is greater than 1000 WHERE LENGTH(LastName) > 3 AND LENGTH(LastName) < 6 would return all rows where the length of the LastName is greater than 3 and less than 6. (4 or 5 characters) WHERE “WHERE” can also compare two columns Example: SELECT first_name, last_name, address FROM Contacts WHERE first_name = last_name; Example output: first_name | last_name | address ========================================= Mohammad | Mohammad | 123 sample st Nguyen | Nguyen | 456 placeholder pl James | James | 789 demonstration dr Different operators “=” – Checks if Values are Equal “!=” – Checks if Values are Not Equal “<>” – Same as above “!=” “>” – Greater than “<” – Less than “>=” – Greater than or Equal to “<=” – Less than or Equal to Combinations A single “WHERE” clause might not sufficiently return the required data, and it might be necessary to add more conditions Here we use a link statement “AND” or “OR” Syntax: SELECT Column FROM Table WHERE Condition AND/OR Condition Combinations Example: SELECT ZipCode, City, State FROM ZipCodes WHERE City = 'New York' OR City = ‘Miami’; Here records where New York or Miami are present will be returned NOT “NOT” is an additional link keyword that can be added to a statement making a condition TRUE when it is not met Example: SELECT City, State FROM PostalCodes WHERE Province = 'ON' AND NOT City = 'Ottawa'; Ranges Link conditions can be used to perform range tests Example: SELECT PartDescription FROM Orders WHERE Ordered >= '2004-10-22' AND Ordered <= '2005-10-22'; You can also use BETWEEN for this purpose SELECT Description FROM Orders WHERE Ordered BETWEEN ‘2004-10-22’ AND ‘2005-10-22’ Ranges It is also possible to determine if a row’s field is part of a set of values Example: SELECT Orders.OrderID, Orders.ContactID FROM Orders WHERE (((Orders.OrderDate)=#4/15/1999#) OR ((Orders.OrderDate)=#4/15/2000#) OR ((Orders.OrderDate)=#4/21/1999#)); IN In the previous case the clause can become very large, very quickly Instead we can use “IN” to create a set Example: the previous example would be rewritten as SELECT OrderID, ContactID FROM Orders WHERE OrderDate IN (#4/15/1999#, #4/15/2000#, #4/21/1999#); 03 Simple Aggregations Grouping/Summarizing Data SQL includes methods to examine multiple rows of data simultaneously and provide a summary of that data “COUNT()” – Returns number of rows meeting a certain condition “SUM()” – Returns the Sum of All Rows “MIN()” – Returns the Minimum Value of All Rows “MAX()” – Returns the Maximum Value of All Rows “AVG()” – Returns the Average Value of All Rows Grouping/Summarizing Data “COUNT(Condition)” is used to count the number of rows meeting the “Condition” Syntax: SELECT COUNT(Condition) FROM Table “WHERE” can be added to the query in order to filter out specific rows – otherwise all are examined Grouping/Summarizing Data ‘*’ – Counts all rows selected, including those with NULL values ‘ALL Column’ Count all rows with a non-NULL values for the specified Column ‘DISTINCT Column’ Count all Unique rows with a non-NULL value for the specified Column Examples: SELECT COUNT(*) FROM ZipCodes SELECT COUNT(State) FROM ZipCodes SELECT COUNT(DISTINCT State) from ZipCodes Grouping/Summarizing Data As ‘COUNT()’ is returned in a column, comma-separated instances allow for multiple counts Example: SELECT COUNT(*) AS EntryCount, COUNT(DISTINCT State) AS HasState FROM ZipCodes; GROUP BY Grouping Data is an essential part of summarisation Grouping provides a clearer breakdown of the summarisation “GROUP BY” is used to specify the grouping in order to return multiple results of the summarisation Example: Ottawa Breweries There are 40 craft breweries in the Ottawa area Our database contains the status of a brewery: whether it is active or closed, as well as the name of the brewery, and the city council district (ward) that it is located in. name | status | ward ====================================== Ashton Brewery | active | Rideau-Jock Big Rig | active | Kitchissippi Draft Horse | closed | Orleans South-Navan Flora Hall | active | Somerset Example: Ottawa Breweries One operation we could do is a COUNT() name | status | ward ====================================== Ashton Brewery | active | Rideau-Jock Big Rig | active | Kitchissippi Draft Horse | closed | Orleans South-Navan Flora Hall | active | Somerset SELECT COUNT(*) as total_breweries FROM BREWERIES total_breweries =============== 4 Example: Ottawa Breweries A group by lets us do more interesting aggregation name | status | ward ====================================== Ashton Brewery | active | Rideau-Jock Big Rig | active | Kitchissippi Draft Horse | closed | Orleans South-Navan Flora Hall | active | Somerset SELECT status, COUNT(*) as total_breweries FROM BREWERIES GROUP BY status status | total breweries ======================== active | 3 closed | 1 Quiz! Earn participation marks! https://pollev.com/jamesbrunet123 Log in with your cmail address yourname@cmail.carleton.ca NOT your.name@carleton.ca If time permits - demo! That's all! Do you have any questions? You can ask now, or post to the Brightspace forums! For personal questions, email: jamesbrunet@cunet.carleton.ca CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon and infographics & images by Freepik