IELM 511: Information System design Introduction Part 1. ISD for well structured data – relational and other DBMS Info storage (modeling, normalization) Info retrieval (Relational algebra, Calculus, SQL) DB integrated API’s ISD for systems with non-uniformly structured data Basics of web-based IS (www, web2.0, …) Markup’s, HTML, XML Design tools for Info Sys: UML Part III: (one out of) API’s for mobile apps Security, Cryptography IS product lifecycles Algorithm analysis, P, NP, NPC Agenda Structured Query Language (SQL) DB API’s Recall our Bank DB design BRANCH( b_name, city, assets) CUSTOMER( cssn, c_name, street, city, banker, banker_type) LOAN( l_no, amount, br_name) PAYMENT( l_no, pay_no, date, amount) EMPLOYEE( e_ssn, e-name, tel, start_date, mgr_ssn) 1 ACCOUNT( ac_no, balance) SACCOUNT( ac_no, int_rate) n CACCOUNT( ac_no, od_amt) 1 n n m BORROWS( cust_ssn, loan_num) n 1 DEPOSIT( c_ssn, ac_num, access_date) DEPENDENT( emp_ssn, dep_name) n m 1 n Background: Structured Query Language Basics of SQL: A DataBase Management System is an IT system Core requirements: - A structured way to store the definition of data [why ?] DDL - Manipulation of data [obviously!] DML SQL: a combined DDL+DML SQL as a DDL A critical element of any design is to store the definitions of its components. In DB design, we deal with tables, using table names, attribute names etc. Each of these terms should have unambiguous syntax and semantics. A systematic way to specify and store these meta-data is by the use of a Data Definition Language The information about the data is stored in a Data Dictionary SQL provides a unified DDL + a Data Manipulation Language (DML). SQL as a DDL: create command A DB stores one or more tables and one or more indexes To create a new database: create database my_database; A table stores data To create a new table: create table my_table ( attribute_name attribute_type …., constraint, … ); constraint, To create an index on a table: create index my_index on my_table( attribute); An index is a special file for faster DB look-up, when searching the specified table for some data using the specified attribute. SQL as a DDL: create command examples create database bank; LOAN( l_no, amount, br_name) create table loan ( l_no char(10), amount double, br_name char(30) references branch(b_name), primary key (loan_number) ); BORROWS( cust_ssn, loan_num) create table borrows ( cust_ssn char(11), loan_num char(10), primary key (cust_ssn, loan_num), constraint borrows_c1 foreign key cust_ssn references customer( cssn), constraint borrows_c2 foreign key loan_num references loan( l_no) ); Note on metadata: system catalogs Metadata = data about data. DBMS manages a ‘data dictionary’ sometimes called ‘system catalog’ with - When was the DB and each table created/modified - Name of each attribute, its data type, and comments describing it, - List of all users who can access the DB and their passwords, - Which user can do what (read/add/update/delete/authorize) to the data. System catalog itself is stored in a table, and users can see (if they have authority) the data in it. SQL as a DML: insert, drop commands To add one row into a table: insert into branch values( “Downtown”, “Brooklyn”, 9000000); insert into loan values( “L17”, 1000, “Downtown”); Note: char( ), date, datetime types: data must be “quoted” integer, single, double (number data types) are not quoted. Sequence in which you execute ‘insert’ matters ! This insert will fail unless table ‘branch’ has a row with ‘Downtown’ To remove an entire table from the DB: drop table branch; Note: this ‘drop’ command will fail if, e.g. there is data in table ‘loan’ [why?] SQL as a DML: select command Optional Required To get some data from a ( set of ) table (s): select attribute1, …, attribute_n from table_1, …, table_m where selection_or_join_condition1, …, selection_or_join_condition_r group by attribute_i having aggregate_function( attribute_j, … ) order by attribute_k SQL as a DML: select command To get some data from a ( set of ) table (s) select customer, loan_no from borrows; select * from borrows; select customer as “customer ssn” from borrows; customer loan_no 111-12-0000 L17 222-12-0000 L23 333-12-0000 L15 444-00-0000 L93 666-12-0000 L17 111-12-0000 L11 999-12-0000 L17 777-12-0000 L16 customer ssn 111-12-0000 loan_no select distinct loan_no from borrows; L17 L23 L15 Notes: * is a wildcard as: gives alias name to attribute L93 L11 L16 222-12-0000 333-12-0000 444-00-0000 666-12-0000 111-12-0000 999-12-0000 777-12-0000 SQL select: row filters Example: Find the names of all branches that have given loans larger than 1200 LOAN select distinct branch_name from loan where amount > 1200 Note: all operations in ‘where’ are applied one row at a time loan_number amount branch_name L17 1000 Downtown L23 2000 Redwood L15 1500 Pennyridge L93 500 Mianus L11 900 Round Hill L16 1300 Pennyridge branch_name Redwood Pennyridge SQL select: joins Example: Find the customer ssn, loan no, amount and branch name for all loans > 1200 BORROWS LOAN select customer, loan.* from borrows, loan where loan_no = loan_number and amount > 1200 WHERE clause: multiple q-conditions and, or, not comparing cell values: >, =, !=, <, etc. loan_number amount branch_name customer loan_no L17 1000 Downtown 111-12-0000 L17 L23 2000 Redwood 222-12-0000 L23 L15 1500 Pennyridge 333-12-0000 L15 L93 500 Mianus 444-00-0000 L93 L11 900 Round Hill 666-12-0000 L17 L16 1300 Pennyridge 111-12-0000 L11 999-12-0000 L17 777-12-0000 L16 q-condition for join of loan, borrows selection condition customer loan_number amount branch_name 222-12-0000 L23 2000 Redwood 333-12-0000 L15 1500 Pennyridge 777-12-0000 L16 1300 Pennyridge SQL select: joins with table and column aliases Example: Find the names of employees and their manager. E=M e_ssn e_name tel start_date mgr_ssn 111-22-3333 Jones 12345 Nov-2005 321-32-4321 333-11-4444 Smith 54321 Mar-1998 111-22-3333 123-45-6789 Lee 54321 Mar-1998 111-22-3333 555-66-8888 Turner 55555 Aug-2002 321-32-4321 987-65-4321 Jones 87621 Mar-1995 888-99-9999 888-99-9999 Chan 87654 Feb-1980 777-77-7777 321-32-4321 Adams 77777 Feb-1990 777-77-7777 777-77-7777 Black 99111 Jan-1980 null select E.e_name as worker, M.e_name as boss from employee as E, employee as M where E.mgr_ssn = M.e_ssn Note: E, M are aliases (copies) of employee table worker boss Jones Adams Smith Jones Lee Jones Turner Adams Jones Chan Chan Black Adams Black Black null SQL select: nested queries, in Example: Find ssn of customers who have both deposit and loan DEPOSIT customer loan_no Jan 1, 09 111-12-0000 L17 A215 Feb 1, 09 222-12-0000 L23 333-12-0000 A102 Feb 28, 09 333-12-0000 L15 555-00-0000 A305 Mar 10, 09 444-00-0000 L93 888-12-0000 A201 Mar 1, 98 666-12-0000 L17 111-12-0000 A217 Mar 1, 09 111-12-0000 L11 999-12-0000 L17 000-12-0000 A101 Feb 25, 09 777-12-0000 L16 c_ssn ac_num accessDate 888-12-0000 A101 222-12-0000 select c_ssn from deposit where c_ssn in ( select customer from borrows) Notes: ‘in’ performs a set membership test BORROWS c_ssn 222-12-0000 333-12-0000 111-12-0000 SQL select: nested queries, in Example: Find ssn of customers who have a deposit but no loan DEPOSIT customer loan_no Jan 1, 09 111-12-0000 L17 A215 Feb 1, 09 222-12-0000 L23 333-12-0000 A102 Feb 28, 09 333-12-0000 L15 555-00-0000 A305 Mar 10, 09 444-00-0000 L93 888-12-0000 A201 Mar 1, 98 666-12-0000 L17 111-12-0000 A217 Mar 1, 09 111-12-0000 L11 999-12-0000 L17 000-12-0000 A101 Feb 25, 09 777-12-0000 L16 c_ssn ac_num accessDate 888-12-0000 A101 222-12-0000 select c_ssn from deposit where c_ssn not in ( select customer from borrows) BORROWS c_ssn 888-12-0000 555-00-0000 888-12-0000 000-12-0000 Notes: ‘not in’ is true if ‘in’ is false. SQL select: nested, correlated queries, exists Existential qualifier (a generalization of ‘in’) Example: Find the names of branches that have given no loan BRANCH LOAN branch_name city assets Downtown Brooklyn 9000000 Redwood Palo Alto 2100000 Pennyridge Horseneck 1700000 Mianus Horseneck 400000 Round Hill Horseneck 8000000 Pownal Bennington 300000 North Town Rye 3700000 Brighton Brooklyn 7100000 loan_number amount branch_name L17 1000 Downtown L23 2000 Redwood L15 1500 Pennyridge L93 500 Mianus L11 900 Round Hill L16 1300 Pennyridge select branch_name from branch where not exists ( select * from loan where branch.branch_name = loan.branch_name) 1. Correlated: ‘where’ clause of inner query refers to outer query 2. ‘exists’ is true is there is >= 1 row in evaluating inner query; ‘not exists’ is true is ‘exists’ is false SQL select: arithmetic operations on columns Report the branch name and assets in units of millions BRANCH select branch_name, assets*0.000001 as “assets (m)” from branch Notes: arithmetic ops can be used in SELECT, WHERE, HAVING branch_name city assets Downtown Brooklyn 9000000 Redwood Palo Alto 2100000 Pennyridge Horseneck 1700000 Mianus Horseneck 400000 Round Hill Horseneck 8000000 Pownal Bennington 300000 North Town Rye 3700000 Brighton Brooklyn 7100000 branch_name assets (m) Downtown 9.0 Redwood 2.1 Pennyridge 1.7 Mianus 0.4 Round Hill 8.0 Pownal 0.3 North Town 3.7 Brighton 7.1 SQL select: group by, group-wise aggregation functions Example: Report the average, maximum amount, and number of loans by branch LOAN loan_number amount branch_name L17 1000 Downtown L23 2000 Redwood L15 1500 Pennyridge L93 500 Mianus L11 900 Round Hill L16 1300 Pennyridge select branch_name, avg( amount) as Avg, max( amount) as Max, count( branch_name) as no_loans from loan group by branch_name branch_name Avg Max order by no_loans desc Pennyridge 1400 1500 1. Aggregating functions: avg, max, min, sum, count 2. avg/max return average/max for each group no_loans 2 Downtown 1000 1000 1 Redwood 2000 2000 1 Mianus 500 500 1 Round Hill 900 900 1 SQL select: group by, having ‘having’ is used to screen out groups from the output Example: Report the small loans (<= 1500) held by 2 or more people. LOAN loan_number amount branch_name L17 1000 Downtown L23 2000 Redwood L15 1500 Pennyridge L93 500 Mianus L11 900 Round Hill L16 1300 Pennyridge BORROWS select loan_number, amount, count( loan_number) as no_debtors from loan, borrows where loan_number = loan_no and amount <= 1500 group by loan_number having count(loan_number) >= 2 customer loan_no 111-12-0000 L17 222-12-0000 L23 333-12-0000 L15 444-00-0000 L93 666-12-0000 L17 111-12-0000 L11 999-12-0000 L17 777-12-0000 L16 loan_number amount no_debtors L17 1000 3 ‘having’ conditions are only applied to data after rows have been grouped ‘order by’ used with ‘group by’ will be applied to groups. SQL select: date functions SQL provides special functions to handle dates, times and strings Example: report those customers who have been inactive for over 5 years DEPOSIT select c_ssn from deposit where datediff( yy, accessDate, getdate( ) ) > 5 datediff units: yy (years), …, ns (nano-seconds) c_ssn ac_num accessDate 888-12-0000 A101 Jan 1, 09 222-12-0000 A215 Feb 1, 09 333-12-0000 A102 Feb 28, 09 555-00-0000 A305 Mar 10, 09 888-12-0000 A201 Mar 1, 98 111-12-0000 A217 Mar 1, 09 000-12-0000 A101 Feb 25, 09 c_ssn ac_num accessDate 888-12-0000 A201 Mar 1, 98 SQL select: string functions It is often useful to use wild-cards for string matching CUSTOMER select ssn, name, street, city from customer where name LIKE ‘J%’ or street LIKE ‘[^mnp]%’ or city LIKE ‘%[ ]%’ Wildcards: % zero or more chars [asd] match one char out of list [asd] [^asd] matches any one char except a, s, d. ssn name street city banker b_type 111-12-0000 Jones Main Harrison 321-32-4321 CRM 222-12-0000 Smith North Rye 321-32-4321 CRM 333-12-0000 Hayes Main Harrison 321-32-4321 CRM 444-12-0000 Curry North Rye 333-11-4444 LO 555-12-0000 Turner Putnam Stamford 888-99-9999 DO 666-12-0000 Williams Nassau Princeton 333-11-4444 LO 777-12-0000 Adams Spring Pittsfield 123-45-6789 LO 888-12-0000 Johnson Alma Palo Alto 888-99-9999 DO 999-12-0000 Brooks Senator Brooklyn 123-45-6789 LO 000-12-0000 Lindsay Park Pittsfield 888-99-9999 DO ssn name street city 111-12-0000 Jones Main Harrison 777-12-0000 Adams Spring Pittsfield 888-12-0000 Johnson Alma Palo Alto 999-12-0000 Brooks Senator Brooklyn SQL as a DML: update command… To modify an entry in a cell update loan set amount = amount - 200 where loan_number = ( select loan_no from borrows, customer where customer = ssn and name = ‘Jones’ ) BORROWS LOAN CUSTOMER loan_number amount branch_name customer loan_no ssn name street city L17 1000 Downtown 111-12-0000 L17 111-12-0000 Jones Main Harrison 321-32-4321 CRM L23 2000 Redwood 222-12-0000 L23 222-12-0000 Smith North Rye 321-32-4321 CRM L15 1500 Pennyridge 333-12-0000 L15 333-12-0000 Hayes Main Harrison 321-32-4321 CRM L93 500 Mianus North Rye 333-11-4444 LO Round Hill L93 Curry 900 444-00-0000 444-12-0000 L11 Putnam Stamford 888-99-9999 DO Pennyridge L17 Turner 1300 666-12-0000 555-12-0000 L16 111-12-0000 L11 666-12-0000 Williams Nassau Princeton 333-11-4444 LO 999-12-0000 L17 777-12-0000 Adams Pittsfield 123-45-6789 LO 777-12-0000 L16 select * from loan LOAN Spring banker loan_number amount L17 800 branch_name b_type 888-12-0000 Johnson Alma Palo Alto 888-99-9999 DO 999-12-0000 Brooks Senator Brooklyn 123-45-6789 LO 000-12-0000 Lindsay Park L15 Pittsfield 1500 888-99-9999 Pennyridge DO L23 2000 Downtown Redwood L93 500 Mianus L11 700 Round Hill L16 1300 Pennyridge SQL as a DML: delete command… To delete a row from a table delete from loan all rows of loan table deleted delete from customer where name = ‘Jones’ request to delete row of customer table with name = ‘Jones’ [will it succeed ?] BORROWS CUSTOMER customer loan_no ssn name street city banker b_type 111-12-0000 L17 111-12-0000 Jones Main Harrison 321-32-4321 CRM 222-12-0000 L23 222-12-0000 Smith North Rye 321-32-4321 CRM 333-12-0000 L15 333-12-0000 Hayes Main Harrison 321-32-4321 CRM 444-00-0000 L93 444-12-0000 Curry North Rye 333-11-4444 LO 666-12-0000 L17 555-12-0000 Turner Putnam Stamford 888-99-9999 DO 111-12-0000 L11 666-12-0000 Williams Nassau Princeton 333-11-4444 LO 999-12-0000 L17 777-12-0000 Adams Spring Pittsfield 123-45-6789 LO 888-12-0000 Johnson Alma Palo Alto 888-99-9999 DO 777-12-0000 L16 999-12-0000 Brooks Senator Brooklyn 123-45-6789 LO 000-12-0000 Lindsay Park Pittsfield 888-99-9999 DO Views in SQL A view is a virtual table defined on a given Database: The columns of the view are either (i) columns from some (actual or virtual) table of the DB or (ii) columns that are computed (from other columns) Main uses of a view: - Security (selective display of information to different users) - Ease-of-use -- Explicit display of derived attributes -- Explicit display of related information from different tables -- Intermediate table can be used to simplify SQL query Views in SQL.. Create a view showing the names of employees, their ssn, telephone number, their manager's name, and how many years they have worked in the bank. create view bank_employee as select e.e_ssn as ssn, e.e-name as name, e.tel as phone, m.e-name as manager, datediff( yy, start_date, getdate( )) as n_years from EMPLOYEE as e, EMPLOYEE as m where e.mgr_ssn = m.e_ssn select * from bank_employee ssn name phone manager n_years 111-22-3333 Jones 12345 Adams 15 333-11-4444 Smith 54321 Jones 12 123-45-6789 Lee 54321 Jones 12 555-66-8888 Turner 55555 Adams 8 987-65-4321 Jones 87621 Chan 15 888-99-9999 Chan 87654 Black 30 321-32-4321 Adams 77777 Black 30 777-77-7777 Black 99111 null 30 Operations on Views View definition is persistent – once you define it, the definition stays permanently in the DB until you drop the view. The DBMS only computes the data in a view when it is referenced in a SQL command (e.g. in a select … command) no physical table is stored in the stored memory corresponding to the view. You can use the view in any SQL query just the same as any other table, BUT (1) You cannot modify the value of a computed attribute (2) If an update/delete command is execute, the underlying data in the referenced table of the view is updated/deleted. [this can cause unexpected changes in your DB] Concluding remarks on SQL SQL language has some other useful commands and operators [e.g. see here] In addition, most DBMS will provide many non-standard operators and services to facilitate information system deployment and administration. DBMSs can handle very large amount of data, and process queries very fast. IBM’s DB2 can handle over 6m transactions per min (tpm); Oracle 10g, over 4m tpm To speed up queries, you can use indexes. Common DBMSs: IBM DB2, Oracle 10g, Microsoft SQL Server, Sybase, MySQL. all support SQL. Database API’s Most people use DBs, but always through some computer program interface (API). Most DBMSs will provide program ‘libraries’ (a collection of a set of complied functions) with functions to: - Connect to the DBMS - Select a DB - Send a SQL command, and receive the response in some standard data structure. Each DBMS provides one library for each programming language. On Windows™ (and several other) systems, these libraries are called ODBC odbc (DLL) SQL query DBMS DB your code odbc func more code Response Client App Bank tables.. BRANCH EMPLOYEE e_ssn e_name 9000000 111-22-3333 Jones 12345 Nov-2005 321-32-4321 Palo Alto 2100000 333-11-4444 Smith 54321 Mar-1998 111-22-3333 Pennyridge Horseneck 1700000 123-45-6789 Lee 54321 Mar-1998 111-22-3333 Mianus Horseneck 400000 555-66-8888 Turner 55555 Aug-2002 321-32-4321 Round Hill Horseneck 8000000 987-65-4321 Jones 87621 Mar-1995 888-99-9999 Pownal Bennington 300000 888-99-9999 Chan 87654 Feb-1980 777-77-7777 North Town Rye 3700000 321-32-4321 Adams 77777 Feb-1990 777-77-7777 Brighton Brooklyn 7100000 777-77-7777 Black 99111 Jan-1980 null branch_name city Downtown Brooklyn Redwood assets tel start_date mgr_ssn CUSTOMER DEPOSIT ssn name street city banker 111-12-0000 Jones Main Harrison 321-32-4321 222-12-0000 Smith North Rye 333-12-0000 Hayes Main 444-12-0000 Curry 555-12-0000 b_type c_ssn ac_num accessDate CRM 888-12-0000 A101 Jan 1, 09 321-32-4321 CRM 222-12-0000 A215 Feb 1, 09 Harrison 321-32-4321 CRM 333-12-0000 A102 Feb 28, 09 North Rye 333-11-4444 LO 555-00-0000 A305 Mar 10, 09 Turner Putnam Stamford 888-99-9999 DO 888-12-0000 A201 Mar 1, 98 666-12-0000 Williams Nassau Princeton 333-11-4444 LO 111-12-0000 A217 Mar 1, 09 777-12-0000 Adams Spring Pittsfield 123-45-6789 LO 000-12-0000 A101 Feb 25, 09 888-12-0000 Johnson Alma Palo Alto 888-99-9999 DO 999-12-0000 Brooks Senator Brooklyn 123-45-6789 LO 000-12-0000 Lindsay Park Pittsfield 888-99-9999 DO LOAN BORROWS customer loan_no 111-12-0000 L17 loan_number amount branch_name 222-12-0000 L23 L17 1000 Downtown 333-12-0000 L15 L23 2000 Redwood 444-00-0000 L93 L15 1500 Pennyridge 666-12-0000 L17 L93 500 Mianus 111-12-0000 L11 L11 900 Round Hill 999-12-0000 L17 L16 1300 Pennyridge 777-12-0000 L16 Not all tables of our normalized design are shown; please create and populate for practice. References and Further Reading Silberschatz, Korth, Sudarshan, Database Systems Concepts, McGraw Hill Next: IS for non-structured data