Geoff Kimber Realize the importance of learning SQL Recognize SQL platform differences Identify relational database concepts Apply SQL syntax to a real-life example Review SQL references Data organized in tree structure with parent and child segments Parent to child is a one to many relationship Implies repeating information, generally in child segments Fileman Prescription file with refill multiple field Data consists of records stored in tables Each row is unique Column values are of the same data type Sequence of rows and columns is insignificant Structured Query Language [SQL] Is an ANSI computer language that is used to interact with many relational databases ▪ Oracle ▪ MS-SQL Server ▪ MySQL, PostgreSQL and others SQL is a set-based language SQL is structured like English ▪ SEQUEL (Simple English QUEry Language) VA is truly data rich Each site has between 2-4 TB of data† ▪ non-imaging data VA FileManager [FileMan] Only within VistA Limited functionality for non-programmers Runs in the production environment ▪ Workflow and resource concerns † - Pham R. The VA Data Lifecycle (Internals, Data Flows, and Business Intelligence) Data repositories (warehouses) Moves the analysis off-line Provides accessible, yet very secure, data ▪ ODBC, ADO and LINQ Greater functionality and access control ▪ The “official” data is secured within VistA ▪ Independent of VistA ▪ Microsoft Active Directory based access ▪ Maintain the front-line stance of CPRS/VistA Various VA relational databases Corporate Data Warehouse (CDW) Regional Data Warehouse (RDW) VISN Data Warehouse (VDW) Potential for local data warehouses ▪ Extract raw data from VistA off-hours ▪ UPSERT the data into an SQL server automatically ▪ Users run their own reports and access is secure Data warehouses = relational databases SQL = the language of relational databases Relational databases are everywhere Learn a portable skill Take a systems approach to collaboration Build reusable code Work more efficiently VA has an enterprise license Microsoft SQL Server is functionally superior to Microsoft Access Secure Backups Procedural code No database size limit (software) Efficiency Two of the different SQL flavors Transact SQL (Microsoft SQL Server) ▪ Manage databases/services, procedural code Jet SQL (Microsoft Access) ▪ Information retrieval, program component Each program has its own SQL flavor Core keywords and functions follow standards ANSI guidelines are “considered” In SQL: Something is either part of a set or it is not Order is meaningless SQL Server supports procedural code ▪ VBA can be used in MS Access for procedures Dimension tables Fact tables Contain attributes that describe records in fact tables Each record contains unique identifier Generally more stable Drug name, provider name, patient name and address ▪ Similar to Drug file (#50), New Person file (#200), Patient file (#2) Capture operational data Contain multiple unique identifiers One identifier for each record One identifier for each data element in dimension tables Similar to Fileman IEN Usually contain numerical and date fields Prescription fills, Outpatient visits, lab test results Similar to prescription file (#52), V POV (#9000010.07) Dim.LocalDrug LocalDrugSID Sta3n (3 digit station number) LocalDrugNameWithDose VaClassification PricePerDispenseUnit NationalDrugSID LocalDrug SID LocalDrugNameWithDose PricePer VaClassifi Dispense National LocalDrug cation Unit DrugSID SID 735055 ATENOLOL 50MG TAB CV100 0.0079 2223484 735055 735056 ATENOLOL 100MG TAB CV100 0.0075 2221541 735056 739331 ATENOLOL 25MG TAB CV100 0.0068 2223485 739331 741066 ATENOLOL 12.5MG (1/2 OF 25MG) TAB CV100 0.0042 2223485 741066 748289 STUDY: ATENOLOL TAB NULL -1 IN000 748289 Dim.NationalDrug NationalDrugSID DrugNameWithoutDose DosageForm Strength NationalDrugSID DrugNameWithoutDose DosageForm Strength 2221541 ATENOLOL TAB 100 2221542 ATENOLOL INJ 500 2221543 ATENOLOL TAB 50 2221544 ATENOLOL/CHLORTHALIDONE TAB NULL 2223485 ATENOLOL TAB 25 sPatient.sPatient PatientSID PatientName PatientSSN AddressLine1 City State ZipCode RxOut.RxOutpat RxOutpatSID PatientSID ProviderSID LocalDrugSID IssueDate RxNumber Rxout.RxoutpatFill RxoutpatFillSID RxOutpatSID FillType ReleaseDateTime Qty DaysSupply UnitPrice Metadata Information about data Includes CDW to VistA field crosswalk http://vaww.cdw.r02.med.va.gov/metadata/Reports/Form s/AllItems.aspx Clauses perform set-based calculations CLAUSE SELECT Definition Defines what you will see. You can identify columns, calculations, literal text, and even perform branching. FROM Defines the origin of the data. Tables can be joined. WHERE Defines data filters. Filters data before it is queried. GROUP BY HAVING ORDER BY Defines how data is grouped. Works hand in hand with aggregate functions, i.e. how many in each group Defines data filters. Filters data after the query is run. An example: return only those items having a count > 5. Defines the field(s) used to sort the data – ASC or DESC Function Definition COUNT Aggregate Function: Counts the number in a group. SUM Aggregate Function: Sums the numbers in a group AVG Aggregate Function: Averages numbers in a group MIN and MAX Aggregate Function: Finds the minimum and maximum numeric values in a group. UCASE and LCASE Scalar Function: Returns text in uppercase or lowercase, respectively LEN Scalar Function: Returns the length of a string Microsoft Access supports the aggregate functions FIRST and LAST SQL Server has a COUNT(DISTINCT...) aggregate function Operator AND OR NOT =, >, <, >=, <= Definition Logical Operator: A and B are both true to equal TRUE Logical Operator: A or B is true to equal TRUE Logical Operator: Switches TRUE to FALSE and vice versa Equality Operators: equals, greater than, etc. BETWEEN Equality Operator: TRUE if value is between the two criteria. Values that equal the criteria are included. Like Equality Operator: TRUE if text contains the pattern defined. LIKE allows for wildcard characters (see below) +, -, *, / + or & Mathematical Operators Concatenation Operators: Join two text strings. (see below) Access uses “*” as a wildcard and double quotes; SQL Server uses “%” and single quotes Access uses the ampersand to concatenate “&”, SQL Server the plus sign “+” Select LocalDrugSID ,NationalDrugSID ,LocalDrugNameWithDose from vdw.dim.localdrug Where LocalDrugNameWithDose like '%atenolol%' Case is not normally important, but helps with readability Spaces, tabs, and hard returns have limited importance Important in text fields Not important between commands Help improve readability Microsoft query analyzer color codes key words for readability SQL databases have many different data types Varchar Int ▪ Bigint ▪ Smallint Decimal Float Datetime Money Select Main command used to retrieve data from SQL tables Followed by list of fields to return ‘*’ (without quotes) returns all fields in table ▪ Normally, avoid using ‘*’ unless you need all the fields in a table From vdw.dim.localdrug Indicates source of data from which you are selecting Vdw.dim.localdrug ▪ VDW is database name ▪ VDW => VISN Data Warehouse ▪ RDW => Regional Data Warehouse ▪ CDW => Corporate Data Warehouse ▪ Dim is schema ▪ Localdrug is table name Where Allows you to restrict or filter the table contents to just the results you want Extremely versatile Like ‘%’ is a multicharacter wildcard ‘%atenolol%’ - contains ‘atenolol’ ‘atenolol%’ – begins with ‘atenolol’ ‘%atenolol’ – ends with ‘atenolol’ Case is not normally important, but helps with readability Spaces, tabs, and hard returns have limited importance Important in text fields Not important between commands Help improve readability Microsoft query analyzer color codes key words for readability Select LocalDrugSID ,NationalDrugSID ,LocalDrugNameWithDose from vdw.dim.localdrug Where localdrugnamewithdose like '%atenolol%' LocalDrugSID NationalDrugSID LocalDrugNameWithDose 735055 2223484 ATENOLOL 50MG TAB 735056 2221541 ATENOLOL 100MG TAB 739331 2223485 ATENOLOL 25MG TAB 741066 2223485 ATENOLOL 12.5MG (1/2 OF 25MG) TAB 1200004446 2223484 STUDY: ATENOLOL 50MG TAB Select VA FileMan Option: PRINT File Entries OUTPUT FROM WHAT FILE: DRUG// SORT BY: GENERIC NAME// 'GENERIC NAME["ATENOLOL" WITHIN GENERIC NAME["ATENOLOL", SORT BY: FIRST PRINT FIELD: NUMBER THEN PRINT FIELD: PSNDF:NUMBER THEN PRINT FIELD: GENERIC NAME THEN PRINT FIELD: DRUG LIST PAGE 1 NUMBER NUMBER GENERIC NAME -----------------------------------------------------------------------3217 4331 ATENOLOL 25MG TAB 4068 4329 ATENOLOL 50MG TAB 4069 4328 ATENOLOL 100MG TAB 9751 4331 ATENOLOL 12.5MG (1/2 OF 25MG) TAB 12975 4329 STUDY: ATENOLOL 50MG TAB Used to connect two or more tables together using data elements that are shared by individual tables Similar to fileman jumps, with more power Inner joins, outer joins and full joins Produces a result that includes only the records that are the same in both tables Select a.localdrugnamewithdose , b.strength from vdw.dim.localdrug as a inner join vdw.dim.nationaldrug b on a.NationalDrugSID = b.NationalDrugSID where LocalDrugNameWithDose like ‘%atenolol%’ localdrugsid localdrugnamewithdose strength 735055 ATENOLOL 50MG TAB 50 735056 ATENOLOL 100MG TAB 100 739331 ATENOLOL 25MG TAB 25 741066 ATENOLOL 12.5MG (1/2 OF 25MG) TAB 25 Results contain all records in one table and only matching records in another table Left joins – return all results in table specified in ‘from’ statement and all matching tables from ‘joined’ table Right joins – return all results in table specified in ‘joined’ table ‘from’ statement and all matching tables from ‘from’ statement Select a.localdrugnamewithdose , b.strength from vdw.dim.localdrug as a left outer join vdw.dim.nationaldrug b on a.NationalDrugSID = b.NationalDrugSID where LocalDrugNameWithDose like ‘%atenolol%’ localdrugsid localdrugnamewithdose strength 735055 ATENOLOL 50MG TAB 50 735056 ATENOLOL 100MG TAB 100 739331 ATENOLOL 25MG TAB 25 741066 ATENOLOL 12.5MG (1/2 OF 25MG) TAB 25 1200004446 STUDY: ATENOLOL 50MG TAB NULL Indicates lack of data Does *not* mean zero Math operations with null values have null results ▪ 2*null = null ▪ 0 * null = null Text and logic operations on null values return null Several functions exist to deal specifically with null values Returns all values in both tables Not particularly applicable in our setting http://www.w3schools.com/sql/ w3schools is an excellent learning resource http://msdn.microsoft.com Microsoft Developer Network is the “go to” place for all coding questions for Microsoft SAMS Teach Yourself SQL in 10-Minutes Excellent starting point (uses older SQL style) SAMS Teach Yourself Microsoft SQL Server T-SQL in 10-Minutes (good intro for Microsoft SQL Server) \\vhacdwa10\BIPLTrainingContent\CDW_Data_101 Recorded Live Meetings http://vaww.cdw.r02.med.va.gov/Pages/ CDW Home CDW Best practices MetaData Relationship Diagrams Additional Training VA TMS Books 24x7 Forta, Ben. SAMS Teach Yourself Microsoft SQL Server T-SQL in 10-Minutes. Indianapolis, IN: SAMS, 2008. Print. Forta, Ben. SAMS Teach Yourself SQL in 10-Minutes –Third Edition. Indianapolis, IN: SAMS, 2004. Print. Kline, Kevin, Hunt, Brand, and Kline, Daniel. SQL in a Nutshell –Third Edition. Cambridge, MA: O’Reilly Media, 2008. Molinaro, Anthony. SQL Cookbook. Cambridge, MA: O’Reilly Media, 2005. Henderson, Ken. The guru's guide to Transact-SQL. Boston: Addison-Wesley, 2000. Print. Roman Steven. Access Database Design & Programming –Third Edition. Cambridge, MA: O’Reilly Media, 2002. 11-124. Print Bluttman Ken, and Freeze Wayne S. Access Data Analysis Cookbook. Cambridge, MA: O’Reilly Media, 2007. 1-106. Print Getz, Ken, Litwin, Paul, and Baron, Andy. Access Cookbook Cambridge, MA: O’Reilly Media, 2002. 1-56. Print Date CJ. SQL and Relational Theory. Cambridge, MA: O’Reilly Media, 2009. http://www.w3schools.com/sql/ http://msdn.microsoft.com