SQL Basics SQL: What is it? • SQL stands for Structured Query Language. • It was originally developed in the early 1970s by IBM as a way to manipulate and retrieve data stored in IBM’s relational DBMS, System R. • It can be pronounced “Sequel” or “S-Q-L”. • With some variations, SQL is more or less the standard query language for relational databases. 2 When do we use SQL? • SQL is most commonly used to retrieve data from a database. • It is also commonly used to modify data. • SQL also contains commands for creating tables and other database objects, and for maintaining data security through granting and denying privileges to users. 3 How is SQL used? • You can use SQL in Access using the Access Query Builder, which will be demonstrated in lab. • All major DBMS’s, like Oracle and MySql, come with client tools which allow database managers and users to write SQL queries and send them to the database. • The next slide shows a screen shot of one of these client tools, Microsoft’s SQL Server Management Studio. 4 Choose a database here. Type the SQL here, then Execute. See the results here. 5 SQL and VB Programming • It is fairly easy to create VB programs which interact with databases in all major DBMSs. • Assignment 2will be a VB program which interacts with Access databases. 6 What can we do with SQL? • SQL has for major components: 1. DQL (Data Query Language): Used to retrieve data from a database. 2. DML (Data Manipulation Language): Used to modify the data in a database—inserting, updating, and deleting. 3. DDL (Data Definition Language): Used to design tables and other database objects. 4. DCL (Data Control Language): Used to control access to the data using security methods. • We will only look at DML and DQL in this course. 7 SELECT Queries • SELECT queries are the most commonly-used queries in SQL. • They are used to retrieve data from one or more tables. • Basic format: • SELECT [Fields] FROM [Table(s)] WHERE [Conditions] ORDER BY [Fields] ASC/DESC • SELECT PlayerName, PlayerPosition FROM Players WHERE TeamID=2 ORDER BY Age DESC 8 WHERE CLAUSE • A where clause is a set of conditions following the word “WHERE” in a query. • Where clauses are used in DELETE FROM, UPDATE, and SELECT queries. • They are used to restrict the rows to be deleted, updated, or selected. • If you write a SQL query without a WHERE clause, all records in the table will be affected: deleted, updated, or selected. 9 Sample WHERE Clauses • Most WHERE clauses are easy to understand. • They can use AND, OR, or NOT, and parentheses to change the order of operation. • Examples: – – – – WHERE PlayerID = 7 WHERE TeamID=2 AND Age>24 WHERE NOT (TeamID=1 AND Age<=30) WHERE Age>20 OR Age<30 [That one will affect all records] – WHERE Age <20 AND Age>30 [Won’t affect any records] 10 BETWEEN and IN in WHERE clauses • BETWEEN and IN can be used to simplify WHERE clauses. • BETWEEN combines two inequalities: – – – – WHERE Age BETWEEN 24 AND 27 Is the same as WHERE Age>=24 AND Age<=27 BETWEEN is always inclusive; that is, it includes the endpoints. • IN combines multiple Ors: – WHERE TeamID IN (2,3,5,8) – Is the same as – WHERE TeamID=2 OR TeamID=3 OR TeamID=5 OR TeamID=8 11 The Fields list • The list of fields following “SELECT” can include many things: • * • Field1, Field2 • Field1 AS SomethingElse (aliased field) • A math expression which can include field names (a calculated field) • An aggregate expression like MAX(Age) or COUNT(*) • A constant value, for example 2, ‘Smith’, or NULL. This can be useful when normalizing a database by combining several tables with identical fields into one. 12 * • Using an asterisk (*) after SELECT tells the database to send us all of the fields in the table: Since the query used an asterisk, it returns all seven columns. It also returns all of the rows in the table. Why? 13 SELECT * FROM TableName • “SELECT * FROM TableName” is the easiest way to view an entire table using SQL. • If you, as a SQL expert, are trying to find data in an unfamiliar database, this can be a good way to start. • However, if the tables are large (lots of rows), it might take a long time for that query to run. • If you just want to see the names of the columns, you can run a SELECT query with an impossible WHERE clause, such as: • SELECT * FROM Players WHERE 1=0 • Here are the results of that query: To summarize: “SELECT *” means “get all fields”; To get all rows, just leave off the WHERE clause. To get no rows, add an impossible WHERE clause. 14 Back to the Field List • While putting “*” after SELECT is easiest, it isn’t always the best way to query a table. • This is especially true when writing VB code: Since you will need to refer to individual field names in your code, you can simplify writing the code by putting the actual field names in the query. • So, instead of writing “SELECT * FROM Players”, we would write: • SELECT PlayerID, TeamID, LastName, FirstName, Age, PlayerPosition, PhoneNumber FROM Players • Now, when we write our code which accesses these fields, we don’t have to go back and look at the database to remember what their names were. • And, if we don’t need certain fields, we can leave them out of the list. If the table is huge, this can make the program run faster. 15 Aliases • Field names can be “aliased”—that is, given different names when displayed in the results. • This is done using the SQL keyword “AS”: • SELECT PhoneNumber AS Phone FROM Players • You can also use this to break the rules about field names when you display them—just put the “incorrect” alias in square brackets: • SELECT PhoneNumber AS [Phone #] FROM Players 16 Aliases and Calculated Fields • SQL can do arithmetic. You can include calculated fields in SELECT queries. The calculations can include the other fields in the table. • Aliases should always be used for calculated fields in SELECT queries. Examples (sorry, I don’t actually have a CarData table to demonstrate these): • SELECT CarID, CarName, Miles, Gallons, Miles/Gallons AS [Miles per Gallon] FROM CarData • SELECT CarID, Miles*1.61 AS Kilometers FROM CarData • SELECT FirstName & ‘ ‘ & LastName AS [Name] FROM Players • Note: SQL generally allows both single (‘) and double (“) quotes to be used for string constants. • I recommend that you use single quotes, especially when creating SQL statements in VB. 17 ORDER BY • Query results can be sorted in a desired order by using ORDER BY at the end of the query. • You should never rely on query results being sorted UNLESS the query includes an ORDER BY clause! • SELECT * FROM Players ORDER BY Age ASC • That query will put the youngest players on top of the list, with the oldest at the bottom. • ASC stands for ascending order. With numbers, this means smallest to largest; with strings, it means alphabetical order. • DESC is the opposite of ASC. • ASC is the default for ORDER BY; if you simply write “ORDER BY Age”, SQL will interpret it as “ORDER BY Age ASC”. 18 ORDER BY Examples • SELECT LastName, FirstName FROM Players ORDER BY LastName, FirstName • For that query, “FirstName” is the secondary sort order—the tie breaker. For players with the same last names, those with first names beginning with A will be listed before those beginning with B, and so on. “Jones, Adam” before “Jones, Bob”. • SELECT * FROM Players ORDER BY TeamID ASC, Age DESC • That query will display all members of team 1 before all members of team 2, but within each team, the oldest players will be listed first. 19 TOP • • • • • • • • To find only the best (or worst) records in any result set, you can use the keyword TOP and the number of records you want to see right after SELECT. If you use TOP, you should always have an ORDER BY clause—remember that you cannot count on query results being sorted any particular way UNLESS you have an ORDER BY clause. SELECT TOP 5 LastName, FirstName, GPA FROM Students ORDER BY GPA That query will find the five WORST students in the table. Don’t let the word “TOP” confuse you. In SQL queries, it doesn’t mean “best”—it means the records that will appear at the TOP of the result set. In this case, ORDER BY GPA will list GPAs from lowest to highest, since the default order for ORDER BY is ASC. If you want the five BEST students, use SELECT TOP 5 LastName, FirstName, GPA FROM Students ORDER BY GPA DESC Note that there is no “BOTTOM” keyword in SQL; to find the bottom, reverse the sort order and use TOP. TOP will display ties, at least in Access. For example, in the last query, if there are seven students with 4.0 GPAs, all seven will displayed as being in the top 5. 20 Aggregates • Aggregates are fields in queries which summarize the values for all records in the table, or for subsets of the records. The aggregates you should know for this class are COUNT, MAX, MIN, SUM, AVG, and STDEV. – MIN: Find the minimum value (or the earliest date for dates) in the specified field in the rows meeting the conditions. – MAX: Find the maximum value (or the latest date for dates) in the specified field in the rows meeting the conditions. – SUM: Add the values in the specified field in the rows meeting the conditions. – AVG: Find the mean value of the specified field in the rows meeting the conditions. – STDEV: Find the standard deviation of the values of the specified field in the rows meeting the conditions. – COUNT: Find out how many rows there are which meet the conditions. For this course, always use COUNT(*), not COUNT(SomeField). They generally return the same numbers, but doing COUNT on a specific field may return a smaller number if that field contains NULLs. 21 Aggregate Examples • (See the TripSummary table in the SampleData.accdb database to see what these queries refer to, although you can probably understand them without having to look at the database.) • SELECT MAX(DistanceKilometers) AS MaxDistance FROM TripSummary • Returns a result set with one column (MaxDistance) and one row. The one cell in the result set contains the maximum distance from all trips in the table. • SELECT AVG(DurationMinutes) AS AvgTime FROM TripSummary WHERE Driver=4 • Again, this returns a one-column, one-row result set containing the average duration of driver 4’s trips. • SELECT COUNT(*) AS NumTrips FROM TripSummary WHERE Driver=37 • Returns the number of rows having driver = 37. 22 AAA: Always Alias Aggregates • If your query says • SELECT MAX(Age) FROM Players • Access will return the following: Access doesn’t have a name for MAX(Age), so it makes up one: “Expr1000” You should always alias aggregates with a meaningful name. SELECT MAX(Age) AS MaxAge FROM Players 23 GROUP BY • Consider this table called Workers (also available in SampleData.accdb): •If you want to know the maximum age of all workers, you use this query: •SELECT MAX(Age) AS MaxAge FROM Workers •which will return MaxAge 64 24 GROUP BY • But if you want to know the maximum age by JobType, use GROUP BY: • SELECT JobType, MAX(Age) AS MaxAge FROM Workers GROUP BY JobType • Which returns JobType Clerical Management Other Professional Sales Service MaxAge 44 51 55 64 22 24 25 Another GROUP BY Example • If you want to know how many men and women there are in the list: • SELECT Sex, COUNT(*) AS Num FROM Workers GROUP BY Sex Sex Num F 5 M 14 26 And Another! • SELECT JobType, Sex, COUNT(*) AS Num FROM Workers GROUP BY JobType, Sex JobType Sex Num Clerical Clerical Management Other Other Professional Professional Sales Service Service F M M F M F M M F M 2 1 2 1 4 1 3 2 1 2 27 MAX/MIN vs. TOP • Above, we saw the results of this query: • SELECT MAX(Age) AS MaxAge FROM Workers MaxAge 64 However, we can get the same result using TOP: SELECT TOP 1 Age FROM Workers ORDER BY Age DESC In addition, we can find out everything about the oldest worker using TOP, something we can’t do with MAX: SELECT TOP 1 * FROM Workers ORDER BY Age DESC WorkerID 1485 YearsOfSchool 18 Sex F YearsWorking 40 UnionMember Wage No 22.2 Age 64 JobType Professional 28 Combining TOP and Aggregates • Therefore, if you just want to find one maximum or minimum and want to know something about the record that possesses that value, use TOP. In most cases, it will be preferable to MAX or MIN. However, if you want to calculate maximum or minimums by categories, use MAX and MIN along with GROUP BY. You can even combine TOP with GROUP BY: • SELECT TOP 3 JobType, AVG(Wage) AS AvgWage FROM Workers GROUP BY JobType ORDER BY AVG(Wage) DESC JobType Management Professional Other AvgWage 18.695 16.2225 9.388 This tells us the three highest paying job categories and the corresponding average wages. 29