SQL

advertisement
SQL
for Crime ANALYSTS
BACIAA Session
Thursday, 22 March 2012
James G. Beldock
Today’s Agenda
• Introductions
• Preliminaries
• Databases, Structured Data,
and Tables
• Demo 1: Exploring Tables
• How Databases Are
Structured (& Why)
• Demo 2: Lots of Tables
• Break
• A Sample CAD Database
• SQL SELECT, part 1
• Using database data in Excel
• Lunch
• Joins
• SQL Select, part 2
• Joining
• “Saving” Joins to a View
• Break
• Views
• Other SQL Commands
1.
2.
Databases, Database Varieties, and SQL
How Databases Are Structured (& Why)
PRELIMINARIES
• Store data permanently
• Sometimes called “persistent storage”
• Data can be
• Structured data
• A Person has: First Name; Last Name; Social Security Number; Photo.JPG
• Unstructured data
• examples: Moby Dick; an entire website; email messages (sometimes)
• Sizes
• Databases can be small (100K, 1MB, etc.) or
• Quite Large (UK Land Registry is 23TB; that’s ~1.1 Libraries of Congress)1
• RIDICULOUSLY LARGE (Google’s index of the web; Facebook’s profiles
database)
1
2
DB2 - the secret database (http://www.theregister.co.uk/2006/01/18/db2_neglected/)
Wolfram Alpha is great for this sort of thing: http://www.wolframalpha.com/input/?i=23+terabytes
that’s 1.84 x 100,000,000,000,000
bits!2
Databases
[silicon valley moment]
• Recently, SQL-running databases have fallen somewhat out of
fashion
• SQL was never cool
• Now it’s officially “uncool” for some purposes, like building
NetFlix
• Highly scalable (thousands of servers?)
• Very flexible data structures
• Today’s session is all about SQL, and SQL is (usually) used with
relational databases, which are, if you ask the cool people, not
as cool as they used to be.
• SQL is still the world’s most prolific database language, and
certainly stores more structured data than any other
environment ever built.
Structured Data
• SQL deals with structured data3
• Structured Data
• Keeps track of one or more types of
things, called Entities (or TABLEs in SQL)
• Knows certain, specific, structured pieces
of information about those entities,
called Attributes (or COLUMNs in SQL)
Note: SQL keywords will be in blue.
They are traditionally written in ALL CAPS.
3
4
Sample Structured Data:
a TABLE of Customers4
and names of Tables or Columns will
appear in Brown or Orange, respectively. They
are traditionally Capitalized (but not ALL CAPS).
Well, nearly always. But not always always: Storing Unstructured Data in SQL Server 2008 – Microsoft,
source: SqlCourse2.com, http://www.sqlcourse2.com/index.html
Database TABLES
Names of COLUMNS
Name of TABLE
Question:
What’s the
name of a
ROW?
ROWs
COLUMNs
a ROW
• 5 COLUMNs (also called Fields):
•
•
•
•
•
Unique IDs
are called
probably a Unique Identifier
KEYs
probably not unique
customeridsome type of number
firstname text (called a String)
lastname string
probably not unique
city
string
probably not unique
state
string
probably not unique
The KEY used to name
a ROW is called the
PRIMARY KEY
Before We Go Further:
SQL
• That is why you’re here, right?
• Structured Query Language (SQL) is:
• A language for asking a database for information (“querying”)
• A language for changing information in a database
• Changing the structure of a database
• Adjusting security, performance, and deployment of databases
• Destroying everything in the database…but don’t worry :-)
Database
Manipulation
Language, DML
Create
Read
Update
Delete
Often called:
DANGEROUS
(seriously, called
admin
functionality,
or Database
Definition
Language, DDL)
SQL’s SELECT Statement
• The single most important SQL statement. Period.
• “Selects” data out of a database, or performs a calculation on
a column, value, table, etc.
• Really simple examples:
• SELECT 'hello'
→ hello
• SELECT 1 + 3
→ 4
SELECT Statement, continued
• More commonly, the basic SELECT statement returns ROWs from a
TABLE:
•
•
SELECT firstname FROM customers
→ John
Leroy
Elroy
Lisa
SELECT firstname, city FROM customers
→ John Lynden
Leroy Pinetop
A special COLUMN name:
Elroy Snoqualmie
Means “all COLUMNs”
Lisa Oshkosh
*
•
SELECT * FROM customers
→ 10101 John Gray
10298 Leroy Brown
10299 Elroy Keller
10315 Lisa Jones
Lynden
Pinetop
Snoqualmie
Oshkosh
Washington
Arizona
Washington
Washington
SELECT Statement: the
Important optioNS (for one
table) list of columns, functions on columns, or *
SELECT
5
FROM
name of table
WHERE
list of conditions to include
(called “predicates”)
ORDER BY
list of columns and direction of sort
(ascending/descending)
GROUP BY
list of columns
5
The full definition of the SQL SELECT statement syntax is much longer and, to some extent, specific to the database
software. See the definition of Microsoft SQL Server 2008 R2’s SELECT statement at http://msdn.microsoft.com/enus/library/ms189499.aspx
SELECT…ORDER BY
• Use Order By to sort by one or more columns, in ascending
or descending order
Effect of ORDER BY clause
SELECT…WHERE
• Use WHERE to filter
• based on one criterion:
Why the [square
brackets]?
• or more than one:
The word state is a
reserved SQL keyword.
When it is used as a
column name, it must be
[bracketed] to avoid
confusion.
FUNCTIONS
• You can add functions to a SQL SELECT statement to perform
various analyses.
• The most common6 are
• Aggregate functions
• count(), which returns the number of somethings, and
• sum(), which adds up the somethings
• Also: min(), max(), avg(), stdev(), var()
• Math, Date and String (text) Manipulation functions
• Math: abs(), ceiling(), power(), sqrt(), others
• String: len(), substring(), replace(), upper(), lower(), left(), right(),
others
• Date: dateadd(), datediff(), datepart(), getdate(), day(), month(),
year(), others
6
The full list is quite long. For SQL Server, see http://msdn.microsoft.com/enus/library/aa258899(v=sql.80).aspx.
[DEMO]
Using FunctionS, WHERE, and
ORDER
BY
• Summary:
• count(*) gives you the count of rows resulting from your query
• You can SELECT any combination of columns
• Unless you GROUP BY, in which case you are limited to the
GROUPed BY columns and aggregate functions applied to other
columns
• Gotchas
• sum(*) doesn’t make sense, but sum(columnname) does—for
columns of numbers
• GROUP BY is finicky: the list of columns you select is limited
• Some things aren’t easy: for example, finding the percent of total
Terminology: DBMS
• “Database” is a generic term; it can refer to:
• A specific set of data running on a Database Server
• A Database Server itself (not really the right term)
• A large body of information kept by a human being (“my recipe database”)
• Databases generally run on a Database Server
•
•
•
•
A computer running Database Management System (DBMS)
Accepts connections (“queries”) from many client computers
Returns a response (“result set”) to each client in response to each query
Can be distributed onto lots of servers (Facebook: 1,800+ MySQL servers)
• DBMS handle multiple databases
• Each Database is stored in one or more “database files”
• Database Files can sometimes be loaded/viewed/edited by other software
Names You Might Encounter
(in the Database World)
•
•
•
•
SQL Server, from Microsoft (also “Microsoft SQL Server”)
Oracle
DB2, from IBM
Less common:
• Microsoft Access, dBase, Sybase
Database Structures
• Most databases have many TABLEs
• 10 would be “few”; 50 would be normal; 150 would be many
• There is a method to this madness
• Different TABLEs contain different categories of information
• Example:
• Customers: contains lots of customers
• Products: contains lots of products
• Orders: combines customers and products (and quantities, etc.)
Why So Many Tables?
• Imagine a world with just 1 table
• The problem of duplicate data:
OrderID
CustomerName
CustomerAddress
Quantity
ProductName
1000
James
123 Main Street, Arcadia, CA, 95000
3
Orange
1001
James
123 Main Street, Arcadia, CA, 95000
4
Apple
1002
George
444 1st Avenue, Sacramento, CA 97000
1
Fork
6
Pear
• Adding a new order is easy:
1003
James
•
123 Main Street, Arcadia, CA 95000
But what happens when James changes his address?
• Answer: need to update every ROW where 'James' is the CustomerName
(ugh!)
Solution: Divide and Conquer
• Divide data into Entities (TABLEs), specific to a given purpose:
CustomerID
CustomerName
CustomerAddress
1
James
123 Main Street, Arcadia, CA, 95000
2
George
444 1st Avenue, Sacramento, CA 97000
OrderID
CustomerID
Quantity
ProductName
1000
1
3
Orange
1001
1
4
Apple
1002
2
1
Fork
Download